Compare commits

...

134 Commits

Author SHA1 Message Date
Vitaliy Filippov 42c03e8631 Add Vitastor support 2024-05-20 19:53:28 +03:00
Thomas Lamprecht 16b7dfe03b bump version to 9.0.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-05-17 17:05:10 +02:00
Fiona Ebner f06b222ece fixes for QEMU 9.0
Most importantly, fix forwards and backwards migration with VirtIO-GPU
display.

Other fixes are for a regression in pflash device (introduced in 8.2)
and some fixes for x86(_64) TCG emulation. One of the patches needed
to be adapted, because it removed a helper that is still in use in
9.0.0.

There also is a revert for a fix in VirtIO PCI devices that turned out
to cause some issues, see the revert itself for more details.

Lastly, there is a change to move compatibility flags for a new
VirtIO-net feature to the correct machine type. The feature was
introduced in QEMU 8.2, but the compatibility flags got added to
machine version 8.0 instead of 8.1. This breaks backwards migration
with machine version 8.1 from a 8.2/9.0 binary to an 8.1 binary, in
cases where the guest kernel enables the feature (e.g. Ubuntu 23.10).
While that breaks migration with machine version 8.1 from an unpatched
to a patched binary, Proxmox VE only ever had 8.2 on the test
repository and 9.0 not yet in any public repository. An upstream
developer suggested it is the proper fix [0]. Upstream submission [1].

[0]: https://lore.kernel.org/qemu-devel/CACGkMEtZrJuhof+hUGVRvLLQE+8nQE5XmSHpT0NAQ1EpnqfmsA@mail.gmail.com/T/#u
[1]: https://lore.kernel.org/qemu-devel/20240517075336.104091-1-f.ebner@proxmox.com/T/#u

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-05-17 15:56:12 +02:00
Fiona Ebner db293008ee backup: improve error when copy-before-write fails for fleecing
With fleecing, failure for copy-before-write does not fail the guest
write, but only sets the snapshot error that is associated to the
copy-before-write filter, making further requests to the snapshot
access fail with EACCES, which then also fails the job. But that error
code is not the root cause of why the backup failed, so bubble up the
original snapshot error instead.

Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-29 17:25:20 +02:00
Fiona Ebner 51232e2e40 fix #5409: backup: fix copy-before-write timeout
The type for the copy-before-write timeout in nanoseconds was wrong.
By being just uint32_t, a maximum of slightly over 4 seconds was
possible. Larger values would overflow and thus the 45 seconds set by
Proxmox's backup with fleecing, resulted in effectively 2 seconds
timeout for copy-before-write operations.

Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-29 17:25:20 +02:00
Thomas Lamprecht 2cd560e0d2 bump version to 9.0.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-29 15:29:52 +02:00
Fiona Ebner 4fbd50e2f9 update submodule and patches to QEMU 9.0.0
Biggest change is that AioContext locking got removed, but no changes
required other than dropping the calls to acquire and release it. As a
consequence, the single parameter for the bdrv_graph_wrlock() call got
removed which also required adaptation.

QAPI docs became stricter requiring to document all members.

Other minor changes:

- Single parameter from migration_is_running() was dropped.
- qemu_mutex_(un)lock_iothread() got renamed to bql_(un)lock().

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-29 15:29:52 +02:00
Thomas Lamprecht 766c61f1b6 d/lintian: ignore missing source warning for linux-user vdso objects
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-29 15:29:52 +02:00
Thomas Lamprecht c19617bf9b bump version to 8.2.2-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-29 10:45:26 +02:00
Fiona Ebner f1eed34ac7 update submodule and patches to QEMU 8.2.2
This version includes both the AioContext lock and the block graph
lock, so there might be some deadlocks lurking. It's not possible to
disable the block graph lock like was done in QEMU 8.1, because there
are no changes like the function bdrv_schedule_unref() that require
it. QEMU 9.0 will finally get rid of the AioContext locking.

During live-restore with a VirtIO SCSI drive with iothread there is a
known racy deadlock related to the AioContext lock. Not new [1], but
not sure if more likely now. Should be fixed in QEMU 9.0.

The block graph lock comes with annotations that can be checked by
clang's TSA. This required changes to the block drivers, i.e.
alloc-track, pbs, zeroinit as well as taking the appropriate locks
in pve-backup, savevm-async, vma-reader.

Local variable shadowing is prohibited via a compiler flag now,
required slight adaptation in vma.c.

Major changes only affect alloc-track:

* It is not possible to call a generated co-wrapper like
  bdrv_get_info() while holding the block graph lock exclusively [0],
  which does happen during initialization of alloc-track when the
  backing hd is set and the refresh_limits driver callback is invoked.

  The bdrv_get_info() call to get the cluster size is moved to
  directly after opening the file child in track_open().

  The important thing is that at least the request alignment for the
  write target is used, because then the RMW cycle in bdrv_pwritev
  will gather enough data from the backing file. Partial cluster
  allocations in the target are not a fundamental issue, because the
  driver returns its allocation status based on the bitmap, so any
  other data that maps to the same cluster will still be copied later
  by a stream job (or during writes to that cluster).

* Replacing the node cannot be done in the
  track_co_change_backing_file() callback, because it is a coroutine
  and cannot hold the block graph lock exclusively. So it is moved to
  the stream job itself with the auto-remove option not having an
  effect anymore (qemu-server would always set it anyways).

  In the future, there could either be a special option for the stream
  job, or maybe the upcoming blockdev-replace QMP command can be used.

  Replacing the backing child is actually already done in the stream
  job, so no need to do it in the track_co_change_backing_file()
  callback. It also cannot be called from a coroutine. Looking at the
  implementation in the qcow2 driver, it doesn't seem to be intended
  to change the backing child itself, just update driver-internal
  state.

Other changes:

* alloc-track: Error out early when used without auto-remove. Since
  replacing the node now happens in the stream job, where the option
  cannot be read from (it's internal to the driver), it will always be
  treated as 'on'. Makes sure to have users beside qemu-server notice
  the change (should they even exist). The option can be fully dropped
  in the future while adding a version guard in qemu-server.

* alloc-track: Avoid seemingly superfluous child permission update.
  Doesn't seem necessary nowadays (maybe after commit "alloc-track:
  fix deadlock during drop" where the dropping is not rescheduled and
  delayed anymore or some upstream change). Replacing the block node
  will already update the permissions of the new node (which was the
  file child before). Should there really be some issue, instead of
  having a drop state, this could also be just based off the fact
  whether there is still a backing child.

  Dumping the cumulative (shared) permissions for the BDS with a debug
  print yields the same values after this patch and with QEMU 8.1,
  namely 3 and 5.

* PBS block driver: compile unconditionally. Proxmox VE always needs
  it and something in the build process changed to make it not enabled
  by default. Probably would need to move the build option to meson
  otherwise.

* backup: job unreferencing during cleanup needs to happen outside of
  coroutine, so it was moved to before invoking the clean

* mirror: Cherry-pick stable fix to avoid potential deadlock.

* savevm-async: migrate_init now can fail, so propagate potential
  error.

* savevm-async: compression counters are not accessible outside
  migration/ram-compress now, so drop code that prophylactically set
  it to zero.

[0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/
[1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-26 14:14:06 +02:00
Fiona Ebner 2e71c17f5b makefile: also filter 64-bit hppa ROM for QEMU 8.2
Same rationale as 6facdf3 ("also exclude hppa-firmware.img ROM from
build"), not used by Proxmox VE and would cause a failure during
build.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-26 14:13:53 +02:00
Fiona Ebner f76e07f370 makefile: adapt firmware blob removal to changes for QEMU 8.2
Namely, it's also necessary to remove .dts source files from the
meson.build file, because the .dtb file names are not directly listed
anymore since commit 6e0dc9d2a8 ("meson: compile bundled device
trees").

The same commit also introduced a "'.dtb'" in a line not just listing
a file name and removing that line would break the script. Be more
precise and require an alphanumeric character before the suffix.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-26 14:13:49 +02:00
Fiona Ebner 71dd2d48f9 Makefile: drop -j option from dpkg-buildpackage
From man dpkg-buildpackage:

> -j, --jobs[=jobs|auto]
> Specifies the number of jobs allowed to be run simultaneously (since
> dpkg 1.14.7, long option since dpkg 1.18.8). The number of jobs
> matching the number of online processors if auto is specified (since
> dpkg 1.17.10), or unlimited number if jobs is not specified. The
> default behavior is auto (since dpkg 1.18.11) in non-forced mode
> (since dpkg 1.21.10), and as such it is always safer to use with any
> package including those that are not parallel-build safe.

The option was added in the Makefile by commit 4ba321f ("build qemu
multithreaded") which states:

> same as in pve-kernel where we have --jobs=auto

But according to the man page, -j without an argument is not the same
and means unlimited. Using the number of online cores seems more
sensible and was the original intention. Again, according to the man
page, the default is auto since dpkg 1.18.11 (or Debian Stretch), so
just drop the option.

The motivation to look into this was that after the recent upstream
commit d1ce2cc95b ("Makefile: preserve --jobserver-auth argument when
calling ninja") having -j as the make flag would be broken as it was
mistakenly passed to ninja (for which the argument for -j is not
optional). Should get fixed soon [0].

[0]: https://lore.kernel.org/qemu-devel/20240412100401.20047-2-pbonzini@redhat.com/T/#u

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-18 22:18:05 +02:00
Thomas Lamprecht 59ab88deb6 bump version to 8.1.5-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-11 20:05:02 +02:00
Thomas Lamprecht 20209d8d73 implement support for backup fleecing
Excerpt from Fiona's v3 cover-letter [0]:

When a backup for a VM is started, QEMU will install a
"copy-before-write" filter in its block layer. This filter ensures
that upon new guest writes, old data still needed for the backup is
sent to the backup target first. The guest write blocks until this
operation is finished so guest IO to not-yet-backed-up sectors will be
limited by the speed of the backup target.

With backup fleecing, such old data is cached in a fleecing image
rather than sent directly to the backup target. This can help guest IO
performance and even prevent hangs in certain scenarios, at the cost
of requiring more storage space.

With this series it will be possible to enable backup-fleecing via
e.g. `vzdump 123 --fleecing enabled=1,storage=local-lvm` with fleecing
images created on the storage `local-lvm`. The fleecing storage should
be a fast local storage which supports thin-provisioning and discard.
If the storage supports qcow2, that is used as the fleecing image
format. If the underlying file system does not support discard, with
qcow2 and preallocation=off, at least already allocated parts of the
image can be re-used later.

Fleecing images are created by qemu-server via pve-storage and
attached to QEMU before the backup starts, and cleaned up after the
backup finished or failed. The naming schema for fleecing images is
'vm-ID-fleece-N(.FORMAT)'. The allocated images are recorded in the
guest configuration, so that even after a hard failure, clean-up can
be re-attempted. While not too bad, it's a non-trivial amount of code
and I'm not 100% sure about the cost-benefit, so sending those as RFC.

The fleecing image needs to be the exact same size as the source, but
luckily, an explicit size can be specified when attaching a raw image
to QEMU so there are no size issues when using storages that have
coarser allocation/round up. For qcow2, it seems that virtual size can
be nearly arbitrary (i.e. modulo 512 byte granularity) during
allocation.

[0]: https://lists.proxmox.com/pipermail/pve-devel/2024-April/062815.html

Originally-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-11 20:05:02 +02:00
Thomas Lamprecht 47bdd04244 bump version to 8.1.5-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-03-12 14:08:48 +01:00
Thomas Lamprecht 8dd76cc52d backup: factor out & clean up gathering device info into helper
Squash the two original patches [0][1] from Fiona, which got send
separate to be easier to review, into the big patch that adds the
Proxmox backup integration.

[0]: https://lists.proxmox.com/pipermail/pve-devel/2024-January/061479.html
[1]: https://lists.proxmox.com/pipermail/pve-devel/2024-January/061478.html

Originally-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-03-12 13:55:00 +01:00
Fiona Ebner cd7676f3e6 backup: avoid bubbling up first ECANCELED error
With pvebackup_propagate_error(), the first error wins. When one job
in the transaction fails, it is expected that later jobs get the
ECANCELED error. Those are not interesting and by skipping them a more
interesting error, which is likely the actual root cause, can win.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-03-12 13:20:28 +01:00
Fiona Ebner 862b46e3e0 cleanup: squash backup dump driver change into patch introducing the driver
Makes it simpler and shorter. Still results in the same code after
applying both patches in question.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-03-12 13:19:30 +01:00
Fiona Ebner 061e9ceb36 fix patch for accepting NULL qiov when padding
All callers of the function pass an address, so dereferencing once
before checking for NULL is required. It's also necessary to update
bytes and offset nevertheless, so the request will actually be aligned
later and not trigger an assertion failure.

Seems like this was accidentally broken in 8dca018 ("udpate and rebase
to QEMU v6.0.0") and this is effectively a revert to the original
version of the patch. The qiov functions changed back then, which
might've been the reason Stefan tried to simplify the patch.

Should fix live-import for certain kinds of VMDK images.

Reported-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-03-12 13:11:21 +01:00
Thomas Lamprecht 0d4462207b bump version to 8.1.5-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-02-21 20:11:27 +01:00
Fiona Ebner ed159bc32a add patch to fix deadlock with VirtIO block and iothread during QMP stop
Backported from commit bfa36802d1 ("virtio-blk: avoid using ioeventfd
state in irqfd conditional") because the rework/rename dataplane ->
ioeventfd didn't happen yet.

Reported in the community forum [0] and reproduced doing a backup loop
to PBS with suspend mode with fio doing heavy IO in the guest and
using an RBD storage (with krbd).

[0]: https://forum.proxmox.com/threads/141320

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-02-21 20:09:22 +01:00
Fiona Ebner 86460aef76 fix #4507: add patch to automatically increase NOFILE soft limit
In many configurations, e.g. multiple vNICs with multiple queues or
with many Ceph OSDs, the default soft limit of 1024 is not enough.
QEMU is supposed to work fine with file descriptors >= 1024 and does
not use select() on POSIX. Bump the soft limit to the allowed hard
limit to avoid issues with the aforementioned configurations.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-02-06 10:33:12 +01:00
Thomas Lamprecht 676adda3c6 bump version to 8.1.5-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-02-02 19:41:31 +01:00
Thomas Lamprecht 4ff04bdfa5 work around stuck guest IO with iothread and VirtIO block/SCSI
This essentially repeats commit 6b7c181 ("add patch to work around
stuck guest IO with iothread and VirtIO block/SCSI") with an added
fix for the SCSI event virtqueue, which requires special handling.
This is to avoid the issue [3] that made the revert 2a49e66 ("Revert
"add patch to work around stuck guest IO with iothread and VirtIO
block/SCSI"") necessary the first time around.

When using iothread, after commits
1665d9326f ("virtio-blk: implement BlockDevOps->drained_begin()")
766aa2de0f ("virtio-scsi: implement BlockDevOps->drained_begin()")
it can happen that polling gets stuck when draining. This would cause
IO in the guest to get completely stuck.

A workaround for users is stopping and resuming the vCPUs because that
would also stop and resume the dataplanes which would kick the host
notifiers.

This can happen with block jobs like backup and drive mirror as well
as with hotplug [2].

Reports in the community forum that might be about this issue[0][1]
and there is also one in the enterprise support channel.

As a workaround in the code, just re-enable notifications and kick the
virt queue after draining. Draining is already costly and rare, so no
need to worry about a performance penalty here.

Take special care to attach the SCSI event virtqueue host notifier
with the _no_poll() variant like in virtio_scsi_dataplane_start().
This avoids the issue from the first attempted fix where the iothread
would suddenly loop with 100% CPU usage whenever some guest IO came in
[3]. This is necessary because of commit 38738f7dbb ("virtio-scsi:
don't waste CPU polling the event virtqueue"). See [4] for the
relevant discussion.

[0]: https://forum.proxmox.com/threads/137286/
[1]: https://forum.proxmox.com/threads/137536/
[2]: https://issues.redhat.com/browse/RHEL-3934
[3]: https://forum.proxmox.com/threads/138140/
[4]: https://lore.kernel.org/qemu-devel/bfc7b20c-2144-46e9-acbc-e726276c5a31@proxmox.com/

Link: https://lore.kernel.org/qemu-devel/20240202153158.788922-1-hreitz@redhat.com/
Originally-by: Fiona Ebner <f.ebner@proxmox.com>
 [ TL: Update to v2 and rebased patch series handling to v8.1.5 ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-02-02 19:35:34 +01:00
Thomas Lamprecht 12b69ed9c5 bump version to 8.1.5-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-02-02 19:08:16 +01:00
Fiona Ebner 5e8903f875 stable fixes for corner case in i386 emulation and crash with VNC clipboard
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-02-02 19:06:29 +01:00
Fiona Ebner 4b7975e75d update submodule and patches to QEMU 8.1.5
Most notable fixes from a Proxmox VE perspective are:

* "virtio-net: correctly copy vnet header when flushing TX"
  To prevent a stack overflow that could lead to leaking parts of the
  QEMU process's memory.
* "hw/pflash: implement update buffer for block writes"
  To prevent an edge case for half-completed writes. This potentially
  affected EFI disks.
* Fixes to i386 emulation and ARM emulation.

No changes for patches were necessary (all are just automatic context
changes).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-02-02 19:06:29 +01:00
Fiona Ebner f366bb97ae bump version to 8.1.2-6
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-12-15 14:26:09 +01:00
Fiona Ebner 2a49e667ba Revert "add patch to work around stuck guest IO with iothread and VirtIO block/SCSI"
This reverts commit 6b7c1815e1.

The attempted fix has been reported to cause high CPU usage after
backup [0]. Not difficult to reproduce and it's iothreads getting
stuck in a loop. Downgrading to pve-qemu-kvm=8.1.2-4 helps which was
also verified by Christian, thanks! The issue this was supposed to fix
is much rarer, so revert for now, while upstream is still working on a
proper fix.

[0]: https://forum.proxmox.com/threads/138140/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-12-15 14:16:26 +01:00
Thomas Lamprecht c6eb05a799 bump version to 8.1.2-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-11 16:59:16 +01:00
Fiona Ebner dfac4f3593 pick fix for potential deadlock with QMP resize and iothread
While the patch gives bdrv_graph_wrlock() as an example where the
issue can manifest, something similar can happen even when that is
disabled. Was able to reproduce the issue with
while true; do qm resize 115 scsi0 +4M; sleep 1; done
while running
fio --name=make-mirror-work --size=100M --direct=1 --rw=randwrite \
 --bs=4k --ioengine=psync --numjobs=5 --runtime=1200 --time_based
in the VM.

Fix picked up from:
https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01102.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-12-11 16:56:50 +01:00
Fiona Ebner 6b7c1815e1 add patch to work around stuck guest IO with iothread and VirtIO block/SCSI
When using iothread, after commits
1665d9326f ("virtio-blk: implement BlockDevOps->drained_begin()")
766aa2de0f ("virtio-scsi: implement BlockDevOps->drained_begin()")
it can happen that polling gets stuck when draining. This would cause
IO in the guest to get completely stuck.

A workaround for users is stopping and resuming the vCPUs because that
would also stop and resume the dataplanes which would kick the host
notifiers.

This can happen with block jobs like backup and drive mirror as well
as with hotplug [2].

Reports in the community forum that might be about this issue[0][1]
and there is also one in the enterprise support channel.

As a workaround in the code, just re-enable notifications and kick the
virt queue after draining. Draining is already costly and rare, so no
need to worry about a performance penalty here. This was taken from
the following comment of a QEMU developer [3] (in my debugging,
I had already found re-enabling notification to work around the issue,
but also kicking the queue is more complete).

[0]: https://forum.proxmox.com/threads/137286/
[1]: https://forum.proxmox.com/threads/137536/
[2]: https://issues.redhat.com/browse/RHEL-3934
[3]: https://issues.redhat.com/browse/RHEL-3934?focusedId=23562096&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-23562096

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-12-11 16:56:50 +01:00
Thomas Lamprecht 24d732ac0f bump version to 8.1.2-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-11-22 14:28:25 +01:00
Fiona Ebner df2cc786ee add fix for vnc clipboard
This fixes the host->guest direction with noNVC as a client (and
likely others).

Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
2023-11-22 14:19:45 +01:00
Thomas Lamprecht 38726d3473 bump version to 8.1.2-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-11-20 10:35:52 +01:00
Fiona Ebner 89b46e17ec fix #5054: backport fix for software reset with SATA
The issue prevented FreeBSD 14 VMs with SATA disk from booting.

The commit it fixes e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document
PxCI handling") is part of stable 8.1.2.

The patch was already applied to the block branch upstream:
https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg02711.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
2023-11-20 10:35:00 +01:00
Thomas Lamprecht 33b22c3fe0 bump version to 8.1.2-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-11-17 11:55:26 +01:00
Fiona Ebner c38e337f5d revert commit breaking VirtIO network adapters for certain versions of Windows
As reported in the community forum [0] and reproduced locally this
breaks VirtIO network adapters in (at least) the German ISO of Windows
Server 2022. The fix itself was for

> Issue is not fatal but as result acpi-index/"PCI Label ID" property
> is either not shown in device details page or shows incorrect value.

so revert and tolerate that as a stop-gap, rather than have the
devices not working at all.

[0]: https://forum.proxmox.com/threads/92094/post-605684

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-11-17 11:52:52 +01:00
Fiona Ebner 763949965f fix #4710: vma create: don't use O_DIRECT for tmpfs
The implementation of the helper is_path_tmpfs() is similar to the
existing qemu_fd_getfs() function in util/mmap-alloc.c, which
unfortunately only takes an existing fd.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-11-07 16:37:34 +01:00
Thomas Lamprecht 1807330a6f bump version to 8.1.2-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Thomas Lamprecht a31ab74058 d/control: add python3-venv as build-dependency
Seems to be required since commit 81e2b198a8 ("configure: create a
python venv unconditionally").

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner b39f726f31 d/control: add versioned Breaks for qemu-server <= 8.0.6
Upstream QEMU commit 4271f40383 ("virtio-net: correctly report maximum
tx_queue_size value") made setting an invalid tx_queue_size for a
non-vDPA/vhost-user net device a hard error. Now, qemu-server before
commit 089aed81 ("cfg2cmd: netdev: fix value for tx_queue_size") did
just that, so the newer QEMU version would break start-up for most VMs
(a default vNIC configuration would be affected).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner a36bda146c add patch to avoid huge snapshot performance regression
Taking a snapshot became prohibitively slow because of the
migration_transferred_bytes() call in migration_rate_exceeded() [0].

This also applied to the async snapshot taking in Proxmox VE, so
work around the issue until it is fixed upstream.

[0]: https://gitlab.com/qemu-project/qemu/-/issues/1821

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner 03ff63aa61 add patch to disable graph locking
There are still some issues with graph locking, e.g. deadlocks during
backup canceling [0] and initial attempts to fix it didn't work [1].
Because the AioContext locks still exist, it should still be safe to
disable graph locking.

[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg00729.html
[1]: https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg06905.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner 10e1093325 update submodule and patches to QEMU 8.1.2
Bigger notable changes:

* Commit 1a30b0f5d7 ("block: .bdrv_open is non-coroutine and
  unlocked") broke the PVE backup patches, in particular setting up
  the backup dump block driver, because bdrv_new_open_driver() cannot
  be called from a coroutine. To fix it, bdrv_co_open() is used
  instead, and while it's a much more involved function, the result
  should be essentially the same. The only difference I noticed is
  that the BDRV_O_ALLOW_RDWR flag is also set in the resulting bds
  (block driver state), but that shouldn't hurt.

Smaller notable changes:

* aio_set_fd_handler() dropped its 'is_external' parameter stating
  that all callers now pass false in 60f782b6b7 ("aio: remove
  aio_disable_external() API"). The calls in the PVE patches also
  passed false, so just drop the parameter too.

* global_state_store() does not have a return value anymore, so the
  user in the PVE savevm-async patch was adapted. For context, see
  c33f1829f8 ("migration: never fail in global_state_store()").

* Renames affecting the PVE savevm-async patch:
  migrate_use_block() -> migrate_block() and ram_counters -> mig_stats
  9d4b1e5f22 ("migration: Move migrate_use_block() to options.c")
  aff3f6606d ("migration: Rename ram_counters to mig_stats")

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner 0d9c737d61 buildsys: use QEMU's keycodemapdb again
instead of the split-out version that was last updated for QEMU 6.0.
This reverts the relevant part of 6838f03 ("bump version to 2.11.1-1")
which doesn't state a reason why the splitting was done. If something
breaks, we can still re-do it and document the reason this time.

Alternatively, it would be necessary to adapt the paths, because
keycodemapdb lives in subprojects/ rather than ui/ since QEMU commit
c53648abba ("meson: use subproject for keycodemapdb").

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner a6ddea7ef7 buildsys: fixup submodule target
It's not enough to initialize the submodules anymore, as some got
replaced by wrap files, see QEMU commit 2019cabfee ("meson:
subprojects: replace submodules with wrap files").

Download the subprojects during initialization of the QEMU submodule,
so building (without the automagical --enable-download) can succeeed
afterwards.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner 89520c1cd0 d/rules: use disable-download option instead of git-submodules=ignore
See the following QEMU commits for reference:
0c5f3dcbb2 ("configure: add --enable-pypi and --disable-pypi")
ac4ccac740 ("configure: rename --enable-pypi to --enable-download, control subprojects too")
6f3ae23b29 ("configure: remove --with-git-submodules=") removed

The last one removed the option and the closest thing to
git-submodule=ignore is using disable-download. Which will then just
verify that the submodules are present.

Building now will require running either
* Running 'meson subprojects download' in the qemu submodule first.
* Using --enable-download, but then the submodules would be downloaded
  for each build (if not already downloaded in the submodule first)
  and it's just a bit too surprising if downloads happen during build.

The disable-download option will also disable automatic downloading of
missing Python modules from PyPI. Hopefully, it's enough to add them
as Debian build dependencies when required.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Thomas Lamprecht eca4daeeed bump version to 8.0.2-7
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-04 08:33:39 +02:00
Fiona Ebner 816077299c fix #2874: SATA: avoid unsolicited write to sector 0 during reset
If there is a pending DMA operation during ide_bus_reset(), the fact
that the IDEstate is already reset before the operation is canceled
can be problematic. In particular, ide_dma_cb() might be called and
then use the reset IDEstate which contains the signature after the
reset. When used to construct the IO operation this leads to
ide_get_sector() returning 0 and nsector being 1. This is particularly
bad, because a write command will thus destroy the first sector which
often contains a partition table or similar.

Upstream discussion:
https://lists.nongnu.org/archive/html/qemu-devel/2023-08/msg04239.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-26 11:30:22 +02:00
Fiona Ebner ef3308db71 vma: avoid compiler warning about incompatible pointer type
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-08 11:18:30 +02:00
Filip Schauer 0ff45eb23e backup: Fix spelling error in function name
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
[FE: fixup patch context]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-08 11:13:04 +02:00
Thomas Lamprecht 6c5563e30b bump version to 8.0.2-6
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-09-06 17:04:04 +02:00
Fiona Ebner 9e0186f289 backup: drop broken BACKUP_FORMAT_DIR
Since upstream QEMU 8.0, it's no longer possible to call
bdrv_img_create() from a coroutine anymore, meaning a backup with the
directory format would crash the QEMU instance.

The feature is only exposed via the monitor and was intended to be
experimental. There were no user reports about the breakage and it
only was noticed during the rebase for QEMU 8.1, because other parts
of the backup code needed adaptation and I decided to check the
BACKUP_FORMAT_DIR case too.

It should not stay in a broken state of course, but avoid the
maintenance cost and just make it a removed feature for Proxmox VE 8
retroactively.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-06 16:59:12 +02:00
Fiona Ebner 0cffb504e7 backup: create jobs in a drained section
With the drive-backup QMP command, upstream QEMU uses a drained
section for the source drive when creating the backup job. Do the same
here to avoid subtle bugs.

There, the drained section extends until after the job is started, but
this cannot be done here for multi-disk backups (could at most start
the first job). The important thing is that the cbw
(copy-before-write) node is in place and the bcs (block-copy-state)
bitmap is initialized, which both happen during job creation (ensured
by the "block/backup: move bcs bitmap initialization to job creation"
PVE patch).

One such bug is one reported in the community forum [0], where using a
drive with iothread can lead to an overlapping block-copy request and
consequently an assertion failure. The block-copy code relies on the
bcs bitmap to determine if a request for a certain range can be
created. Each time a request is created, it resets the bcs bitmap at
that range to indicate that it's being handled.

The duplicate request can happen as follows:
Thread A attaches the cbw node
Thread B creates a request and resets the bitmap at that range
Thread A clears the bitmap and merges it with the PBS bitmap
The merging can lead to the bitmap being set again at the range of
the previous request, so the block-copy code thinks it's fine to
create a request there.
Thread B creates another requests at an overlapping range before the
other request is finished.

The drained section ensures that nothing else can interfere with the
bcs bitmap between attaching the copy-before-write block node and
initialization of the bitmap.

[0]: https://forum.proxmox.com/threads/133149/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-06 16:59:12 +02:00
Fiona Ebner f7eed6caa1 regenerate patch stats
Apparently wasn't correct in 0cff91a ("fix #1534: vma: Add extract
filter for disk images").

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-06 16:59:12 +02:00
Filip Schauer 0cff91a000 fix #1534: vma: Add extract filter for disk images
Add a filter to the "vma extract" command. A comma seperated list of
disk images that should be extracted can be passed with the "-d" option.

Example to extract an IDE drive and an SCSI drive from vzdump.vma:

vma extract vzdump.vma -d "drive-ide0,drive-scsi0" extractdir

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
2023-08-30 10:40:51 +02:00
Fiona Ebner 6cadf3677d bump version to 8.0.2-5
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-08-16 11:56:49 +02:00
Fiona Ebner 5f9cb29c3a backup: trim heap after finishing
Reported in the community forum [0]. By default, there can be large
amounts of memory left assigned to the QEMU process after backup.
Likely because of fragmentation, it's necessary to explicitly call
malloc_trim() to tell glibc that it shouldn't keep all that memory
resident for the process.

QEMU itself already does a malloc_trim() in the RCU thread, but that
code path might not be reached (or not for a long time) under usual
operation. The value of 4 MiB for the argument was also copied from
there.

Example with the following configuration:
> agent: 1
> boot: order=scsi0
> cores: 4
> cpu: x86-64-v2-AES
> ide2: none,media=cdrom
> memory: 1024
> name: backup-mem
> net0: virtio=DA:58:18:26:59:9F,bridge=vmbr0,firewall=1
> numa: 0
> ostype: l26
> scsi0: rbd:base-107-disk-0/vm-106-disk-1,size=4302M
> scsihw: virtio-scsi-pci
> smbios1: uuid=b2d4511e-8d01-44f1-afd6-9581b30c24a6
> sockets: 2
> startup: order=2
> virtio0: lvmthin:vm-106-disk-1,iothread=1,size=1G
> virtio1: lvmthin:vm-106-disk-2,iothread=1,size=1G
> virtio2: lvmthin:vm-106-disk-3,iothread=1,size=1G
> vmgenid: 0a1d8751-5e02-449d-977e-c0160e900231

Before the change:

> root@pve8a1 ~ # grep VmRSS /proc/$(cat /var/run/qemu-server/106.pid)/status
> VmRSS:	  370948 kB
> root@pve8a1 ~ # vzdump 106 --storage pbs
> (...)
> INFO: Backup job finished successfully
> root@pve8a1 ~ # grep VmRSS /proc/$(cat /var/run/qemu-server/106.pid)/status
> VmRSS:	 2114964 kB

After the change:

> root@pve8a1 ~ # grep VmRSS /proc/$(cat /var/run/qemu-server/106.pid)/status
> VmRSS:	  398788 kB
> root@pve8a1 ~ # vzdump 106 --storage pbs
> (...)
> INFO: Backup job finished successfully
> root@pve8a1 ~ # grep VmRSS /proc/$(cat /var/run/qemu-server/106.pid)/status
> VmRSS:	  424356 kB

[0]: https://forum.proxmox.com/threads/131339/

Co-diagnosed-by: Friedrich Weber <f.weber@proxmox.com>
Co-diagnosed-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2023-08-16 11:50:12 +02:00
Fiona Ebner c36e3f9d17 refresh patch context
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2023-08-16 11:50:08 +02:00
Filip Schauer b8b4ce0480 Add format attributes to function candidates
Add format attributes to functions that take printf-like arguments. This
provides additional compile-time checking that the correct parameters
are passed to the functions.

This fixes compiler warnings generated by the -Wsuggest-attribute=format
flag.

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
2023-08-08 09:08:48 +02:00
Fiona Ebner df47146afe add patch fixing fd leak for vhost
Each pause+resume operation (which is also done as part of taking a VM
snapshot) would increase the number of open file descriptors by the
number of vhost devices (e.g. network devices by default). This could
lead to crashes during backup and surely other issues once the system
limit (default 1024) was reached [0].

[0]: https://forum.proxmox.com/threads/131603/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-08-03 17:40:13 +02:00
Fabian Grünbichler d9cbfafeeb bump version to 8.0.2-4
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2023-07-28 12:59:10 +02:00
Fiona Ebner 5919ec1446 add patch fixing resume for snapshot and hibernate with drive with iothread and a dirty bitmap
Not difficult to run into, just have a drive with iothread, take a PBS
backup and then take a snapshot or hibernate. Resuming will fail with
> qemu: qemu_mutex_unlock_impl: Operation not permitted
because of not acquiring the correct AioContext first.

Migration is not affected, because it runs in coroutine context.

Reported in the community forum:
https://forum.proxmox.com/threads/129899/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-07-28 12:00:50 +02:00
Thomas Lamprecht 409db0cd7b bump version to 8.0.2-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-15 13:59:12 +02:00
Fiona Ebner ea7662074d fix checks for drive mirror with bitmap
The QAPI change for QEMU 8.0 dropped redundant has_foo parameters, but
in the blockdev_mirror_common() function (which is not part of the
QAPI itself but called from there) the argument pair was has_bitmap
and bitmap_name rather than has_bitmap and bitmap.

Reported-by: Aaron Lauterer <a.lauterer@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-15 13:55:22 +02:00
Fiona Ebner d847446186 regenerate patches
There's still some context changes not covered by earlier series. No
functional change intended.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-15 13:55:22 +02:00
Thomas Lamprecht 3aaa855e5c bump version to 8.0.2-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-09 07:58:59 +02:00
Fiona Ebner 99f9ce2cd2 drop deprecated custom drive snapshot QMP commands
They are not required anymore since qemu-server >= 5.0-36.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-07 19:35:53 +02:00
Fiona Ebner a816d2969e drop patch for custom get_link_status QMP command
There doesn't seem to be any Proxmox VE code using this.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-07 19:35:40 +02:00
Thomas Lamprecht 0e9a7bfda2 bump version to 8.0.2-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-06 16:35:20 +02:00
Fiona Ebner a39364b9d1 update reentrancy patches to version in upstream git
The previous version was picked from the mailing list and still had
an object_dynamic_cast call in a hot path, which is avoided with the
version that landed in git.

Also adds a few more exceptions for devices that need reentrancy.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-06 16:32:38 +02:00
Fiona Ebner 0f693c2cab update submodule and patches to QEMU 8.0.2
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-06 16:32:38 +02:00
Thomas Lamprecht 88b1550dfb buildsys: remove edk2 source tree when assembling build-dir
we ship it via pve-edk2-firmware anyway and it only results in bigger
source tar balls and lintian yelling at us due to edk2 not being the
simplest repo to ensure DFSG compat.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-24 10:37:10 +02:00
Thomas Lamprecht bd3c1fa525 bump version to 8.0.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-23 14:09:12 +02:00
Thomas Lamprecht de2dde2da9 buildsys: avoid handling noopt locally, rather extend CFLAGS
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-23 14:09:12 +02:00
Thomas Lamprecht 04e0262e2e d/rules: add identation for configure switches for readability
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:23:55 +02:00
Thomas Lamprecht d3c2ae9683 d/control: drop obsolete build dependencies
drop autotools-dev, texi2html and texinfo build dependencies, they
are not used and have no effect

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:11:33 +02:00
Thomas Lamprecht d0603efa38 buildsys: auto-generate dbgsym package
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner db5d2a4b77 squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.

If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.

The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.

No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.

The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.

The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.

The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner b64c4dec1c PVE backup: don't call no_co_wrapper function from coroutine
Namely, pvebackup_co_prepare() needs to call bdrv_co_open() rather
than bdrv_open(), because it is a coroutine itself.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner 53b56ca781 add stable patches for 8.0.0
Changes to other patches are all just metadata/context changes except
for pvebackup_co_prepare() needing to call bdrv_co_unref() rather than
bdrv_unref(), because it is a coroutine itself. This is documented in
d6ee2e324e ("block-coroutine-wrapper: Introduce no_co_wrapper"). The
change is necessary, because one of the stable fixes converts
bdrv_unref and blk_unref into no_co_wrappers (in preparation for a
second patch to fix a hang with the block resize QMP command).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner bf251437e9 update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:

* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.

* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.

* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.

* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.

* Async snapshot-related changes:
  - The pending querying got adapted to the above-mentioned split and
  a patch is added to optimize it/make it more similar to what
  upstream code does.
  - Added initialization of the compression counters (for
    future-proofing).
  - It's necessary the hold the BQL (big QEMU lock = iothread mutex)
  during the setup phase, because block layer functions are used there
  and not doing so leads to racy, hard-to-debug crashes or hangs. It's
  necessary to change some upstream code too for this, a version of
  the patch "migration: for snapshots, hold the BQL during setup
  callbacks" is intended to be upstreamed.
  - Need to take the bdrv graph read lock before flushing.

* hmp_info_balloon was moved to a different file.

* Needed to include a new headers from time to time to still get the
correct functions.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner fb818ea5b9 d/rules: drop virtiofsd switch
virtiofsd is no longer part of QEMU 8.0. It got replaced by a separate
implementation written in Rust, which will be its own package.

See QEMU commit 0aaf44776e ("Merge tag 'pull-virtiofs-20230216b' of
https://gitlab.com/dagrh/qemu into staging").

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Thomas Lamprecht 3c995a426d makefile: convert to use simple parenthesis
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Thomas Lamprecht be7ce325c7 d/lintian-overrides: ignore groff line breakage/adjustment warnings
not much we can do here anyway..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Thomas Lamprecht 19b4b4c50f d/lintian-overrides: sort
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Thomas Lamprecht 590adba81a d/parse-machines: produce stable json output
Enabling the "canonical" option the keys will be sorted, improving
build reproducibility.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner abb04bb627 d/control: define compat level via build-depends and raise to 13
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht 6facdf3a08 also exclude hppa-firmware.img ROM from build
We don't use it and with debhelper compat level >= 11, the switch
from detecting files for strip through patters to checking for an ELF
header caused a build failure with the hppa-firmware.img ROM, as some
tools cannot cope with HP PARISC files.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht cb2b3190a4 move cleanup of unused ROMs from d/rules to build-dir generation
this way we save a bit of space and should make build also slightly
faster, otherwise nothing should change.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht 2e416ad9d5 d/rules: fix debian-rules-missing-required-target
until we switch fully over to the dh sequencer

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht d80ca49db8 d/rules: cleanup cruft and use dpkg makefile fragements
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht d65b507d3f buildsys: update lintian overrides
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht 98fd8612cb add .gitignore file
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:14 +02:00
Thomas Lamprecht 4f56d29218 buildsys: use shorter variable name $@ in $(BUILDIR) target
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:14 +02:00
Thomas Lamprecht cd148033f3 buildsys: only run lintian for phony dsc target
This allows the sbuild to start much faster (lintian takes ~ minutes
for such big packages), and that without loss as sbuild will run
lintian on both binary and source package anyway.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:14 +02:00
Thomas Lamprecht 92c6d84f6a d/control: avoid versioned build-dependcies with a -1 revision
no effect besides making it harder to build this for an eventual
backport.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:14 +02:00
Thomas Lamprecht b8af8dd4fa debian: normalize packaging files with wrap-and-sort -tkn
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:13 +02:00
Fiona Ebner 6eb3e31968 d/rules: fix comment about when clean target is executed
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:51:16 +02:00
Fiona Ebner c913853be7 d/rules: move copying config.guess and config.sub to config.status target
It causes problems when done as part of the clean target when building
the dsc with the following error due to the additional files:
dpkg-source: error: aborting due to unexpected upstream changes

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:51:16 +02:00
Fiona Ebner 4fc4b533b5 buildsys: fix lintian overrides
See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1007002 for more
information.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:51:16 +02:00
Fiona Ebner 023b916380 d/rules: set job flag for make based on DEB_BUILD_OPTIONS
Copied from Debian's QEMU package's d/rules. Otherwise, ninja will end
up using only a single job (in Debian Bookworm/Proxmox VE 8).

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:51:16 +02:00
Fiona Ebner 19a11f24a5 buildsys: expand clean target
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
 [ T: remove all tarballs for a package and any .deb ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Fiona Ebner 030fa1db4b buildsys: create build directory atomically
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Fiona Ebner 2d17b4b4d9 buildsys: add sbuild convenience target
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Fiona Ebner 280d157f1c buildsys: add dsc target
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Fiona Ebner f6be0ca51a buildsys: derive upload dist automatically
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Thomas Lamprecht 93d558c1ee bump version to 7.2.0-8
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-17 15:48:12 +01:00
Fiona Ebner e752bbe5e2 cherry-pick TCG-related stable fixes for 7.2
When turning off the "KVM hardware virtualization" checkbox in Proxmox
VE, the TCG accelerator is used, so these fixes are relevant then.

The first patch is included to allow cherry-picking the others without
changes.

Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-17 15:46:20 +01:00
Thomas Lamprecht 018ef788b3 bump version to 7.2.0-8
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-17 12:12:02 +01:00
Fiona Ebner 72fc94c0c6 add patch fixing ACPI CPU hotplug issue with TCG
Required for the debian/edk2-vars-generator.py script in the
pve-edk2-firmware repository when building the edk2-stable202302
release. Without this patch, the QEMU process spawned by the script
would hang indefinietly.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-17 12:06:22 +01:00
Thomas Lamprecht 09186f4b6e bump version to 7.2.0-7
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-13 17:42:52 +01:00
Fiona Ebner ffda59f626 add patches to fix regression with LSI SCSI controller
The patch 0008-memory-prevent-dma-reentracy-issues.patch introduced a
regression for the LSI SCSI controller leading to boot failures [0],
because, in its current form, it relies on reentrancy for a particular
ram_io region.

[0]: https://forum.proxmox.com/threads/123843

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-13 17:36:22 +01:00
Fiona Ebner 3c4f941ac7 add more stable fixes
The patches were selected from the recent "Patch Round-up for stable
7.2.1" [0]. Those that should be relevant for our supported use-cases
(and the upcoming nvme use-case) were picked. Most of the patches
added now have not been submitted to qemu-stable before.

The follow-up for the virtio-rng-pci migration fix will break
migration between versions with the fix and without the fix when a
virtio-pci-rng(-non)-transitional device is used. Luckily Proxmox VE
only uses the virtio-pci-rng device, and this was fixed by
0006-virtio-rng-pci-fix-migration-compat-for-vectors.patch which was
applied before any public version of Proxmox VE's QEMU 7.2 package was
released.

[0]: https://lists.nongnu.org/archive/html/qemu-stable/2023-03/msg00010.html
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2162569

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-13 17:36:19 +01:00
Fiona Ebner 3a94e1a186 fixup patch "ide: avoid potential deadlock when draining during trim"
The patch was incomplete and (re-)introduced an issue with a potential
failing assertion upon cancelation of the DMA request.

There is a patch on qemu-devel now[0], and it's the same as this one
code-wise (except for comments). But the discussion is still ongoing.
While there shouldn't be a real issue with the patch, there might be
better approaches. The plan is to use this as a stop-gap for now and
pick up the proper solution once it's ready.

[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg03325.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-13 17:36:19 +01:00
Thomas Lamprecht 67cae45f41 bump version to 7.2.0-6
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-08 14:32:22 +01:00
Fiona Ebner 58659169de add patch to avoid potential deadlock with trim for IDE/SATA and draining
In particular, the deadlock can occur, together with unlucky timing
between the QEMU threads, when the guest is issuing trim requests
during the start of a backup operation.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
 [ T: resolve trivial merge conflict in series file ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-08 14:22:36 +01:00
Fiona Ebner 10691e04e9 add patch fixing Linux boot failures with megasas SCSI
A regression in 7.2 and easily reproduced.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-07 19:50:12 +01:00
Thomas Lamprecht 09723b9298 bump version to 7.2.0-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-02-21 13:50:08 +01:00
Fiona Ebner 00e2507aac add fix for iscsi double free issue leading to crashes
Reported here[0] and here[1].

[0]: https://gitlab.com/qemu-project/qemu/-/issues/1378
[1]: https://forum.proxmox.com/threads/122776/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 13:49:19 +01:00
Fiona Ebner e7e5f63573 add patch fixing DMA reentrancy issues
that could lead to use-after-frees and stack overflows with a
malicious (or buggy) guest. See [0] for a good summary:

[0]: https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 10:18:35 +01:00
Fiona Ebner 1688b43738 QMP backup: use correct errno when getting blockdrive length fails
di->size would only be set later. The errno is minus the return value
from the function.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 09:19:16 +01:00
Fiona Ebner eee064d954 savevm-async: keep more free space when entering final stage
In qemu-server, we already allocate 2 * $mem_size + 500 MiB for driver
state (which was 32 MiB long ago according to git history). It seems
likely that the 30 MiB cutoff in the savevm-async implementation was
chosen based on that.

In bug #4476 [0], another issue caused the iteration to not make any
progress and the state file filled up all the way to the 30 MiB +
pending_size cutoff. Since the guest is not stopped immediately after
the check, it can still dirty some RAM and the current cutoff is not
enough for a reproducer VM (was done while bug #4476 still was not
fixed), dirtying memory with
> stress-ng -B 2 --bigheap-growth 64.0M'
After entering the final stage, savevm actually filled up the state
file completely, leading to an I/O error. It's probably the same
scenario as reported in the bug report, the error message was fixed in
commit a020815 ("savevm-async: fix function name in error message")
after the bug report.

If not for the bug, the cutoff will only be reached by a VM that's
dirtying RAM faster than can be written to the storage, so increase
the cutoff to 100 MiB to have a bigger chance to finish successfully,
while still trying to not increase downtime too much for
non-hibernation snapshots.

[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=4476

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 08:39:08 +01:00
Fiona Ebner 8051a24b5f fix #4476: savevm-async: avoid looping without progress
when pend_postcopy is large. By definition, pend_postcopy won't
decrease when iterating, so a value larger than the cutoff of 400000
would lead to essentially empty iterations, filling up the state file
until only 30 MiB + pending_size remain and the second half of the
check would trigger.

Avoid this, by not considering pend_postcopy for the cutoff to enter
the final phase.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 08:39:08 +01:00
Fiona Ebner ade9f50160 d/rules: add note explaining why using noopt doesn't currenlty work
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-14 10:04:21 +01:00
Fiona Ebner 0fde60fd10 d/rules: add missing export for CFLAGS
Otherwise, they don't affect the build of QEMU at all.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-14 10:04:21 +01:00
Thomas Lamprecht d82c5eb632 bump version to 7.2.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-01-27 09:37:53 +01:00
Fiona Ebner d5f6ef56f0 add patch to fix issue with VirtIO disk using detect-zeroes=unmap
Affects Proxmox VE, when the discard disk setting is used for a
VirtIO disk.

Upstream bug report:
https://gitlab.com/qemu-project/qemu/-/issues/1404

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-01-27 09:36:41 +01:00
Fabian Grünbichler 658cba46ee d/control: also conflict with "qemu-system-data"
it ships files also shipped by our qemu package, switching from Debian qemu to
ours doesn't work without manual intervention otherwise..

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2023-01-26 10:55:37 +01:00
Fiona Ebner a02081501a savevm-async: fix function name in error message
which also makes it distinguishable from the other
"qemu_savevm_state_iterate error" message.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-01-24 17:08:54 +01:00
Thomas Lamprecht baf4e3132d bump version to 7.2.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-01-12 13:13:23 +01:00
Fiona Ebner 48c307550a add regression fix for migration with virtio-rng device
between QEMU less than 7.2 and QEMU 7.2 without the fix (both
directions are affected).

As mentioned in the patch message, this fix itself will break
migration between QEMU 7.2 and QEMU 7.2 with the fix (in both
directions, if a virtio-rng device is attached), but this is fine,
because no pve-qemu-kvm package with QEMU 7.2 has been publicly
released yet.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-01-12 13:10:19 +01:00
126 changed files with 7156 additions and 12135 deletions

7
.gitignore vendored Normal file
View File

@ -0,0 +1,7 @@
/*.build
/*.buildinfo
/*.changes
/*.deb
/*.dsc
/*.tar*
/pve-qemu-kvm-*.*/

View File

@ -1,60 +1,90 @@
include /usr/share/dpkg/pkg-info.mk
include /usr/share/dpkg/architecture.mk
include /usr/share/dpkg/default.mk
PACKAGE = pve-qemu-kvm
SRCDIR := qemu
BUILDDIR ?= ${PACKAGE}-${DEB_VERSION_UPSTREAM}
BUILDDIR ?= $(PACKAGE)-$(DEB_VERSION_UPSTREAM)
ORIG_SRC_TAR=$(PACKAGE)_$(DEB_VERSION_UPSTREAM).orig.tar.gz
GITVERSION := $(shell git rev-parse HEAD)
DEB = ${PACKAGE}_${DEB_VERSION_UPSTREAM_REVISION}_${DEB_BUILD_ARCH}.deb
DEB_DBG = ${PACKAGE}-dbg_${DEB_VERSION_UPSTREAM_REVISION}_${DEB_BUILD_ARCH}.deb
DSC=$(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION).dsc
DEB = $(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION)_$(DEB_BUILD_ARCH).deb
DEB_DBG = $(PACKAGE)-dbgsym_$(DEB_VERSION_UPSTREAM_REVISION)_$(DEB_BUILD_ARCH).deb
DEBS = $(DEB) $(DEB_DBG)
all: $(DEBS)
.PHONY: submodule
submodule:
test -f "${SRCDIR}/configure" || git submodule update --init --recursive
ifeq ($(shell test -f "$(SRCDIR)/configure" && echo 1 || echo 0), 0)
git submodule update --init --recursive
cd $(SRCDIR); meson subprojects download
endif
$(BUILDDIR): keycodemapdb | submodule
PC_BIOS_FW_PURGE_LIST_IN = \
hppa-firmware.img \
hppa-firmware64.img \
openbios-ppc \
openbios-sparc32 \
openbios-sparc64 \
palcode-clipper \
s390-ccw.img \
s390-netboot.img \
u-boot.e500 \
.*[a-zA-Z0-9]\.dtb \
.*[a-zA-Z0-9]\.dts \
qemu_vga.ndrv \
slof.bin \
opensbi-riscv.*-generic-fw_dynamic.bin \
BLOB_PURGE_SED_CMDS = $(foreach FILE,$(PC_BIOS_FW_PURGE_LIST_IN),-e "/$(FILE)/d")
BLOB_PURGE_FILTER = $(foreach FILE,$(PC_BIOS_FW_PURGE_LIST_IN),-e "$(FILE)")
$(BUILDDIR): submodule
# check if qemu/ was used for a build
# if so, please run 'make distclean' in the submodule and try again
test ! -f $(SRCDIR)/build/config.status
rm -rf $(BUILDDIR)
cp -a $(SRCDIR) $(BUILDDIR)
cp -a debian $(BUILDDIR)/debian
rm -rf $(BUILDDIR)/ui/keycodemapdb
cp -a keycodemapdb $(BUILDDIR)/ui/
echo "git clone git://git.proxmox.com/git/pve-qemu.git\\ngit checkout $(GITVERSION)" > $(BUILDDIR)/debian/SOURCE
rm -rf $@.tmp $@
cp -a $(SRCDIR) $@.tmp
cp -a debian $@.tmp/debian
rm -rf $@.tmp/roms/edk2 # packaged separately
find $@.tmp/pc-bios -type f | grep $(BLOB_PURGE_FILTER) | xargs rm -f
sed -i $(BLOB_PURGE_SED_CMDS) $@.tmp/pc-bios/meson.build
echo "git clone git://git.proxmox.com/git/pve-qemu.git\\ngit checkout $(GITVERSION)" > $@.tmp/debian/SOURCE
mv $@.tmp $@
.PHONY: deb kvm
deb kvm: $(DEBS)
$(DEB_DBG): $(DEB)
$(DEB): $(BUILDDIR)
cd $(BUILDDIR); dpkg-buildpackage -b -us -uc -j
cd $(BUILDDIR); dpkg-buildpackage -b -us -uc -j32
lintian $(DEBS)
.PHONY: update
update:
cd $(SRCDIR) && git submodule deinit ui/keycodemapdb || true
rm -rf $(SRCDIR)/ui/keycodemapdb
mkdir $(SRCDIR)/ui/keycodemapdb
cd $(SRCDIR) && git submodule update --init ui/keycodemapdb
rm -rf keycodemapdb
mkdir keycodemapdb
cp -R $(SRCDIR)/ui/keycodemapdb/* keycodemapdb/
git add keycodemapdb
sbuild: $(DSC)
sbuild $(DSC)
$(ORIG_SRC_TAR): $(BUILDDIR)
tar czf $(ORIG_SRC_TAR) --exclude="$(BUILDDIR)/debian" $(BUILDDIR)
.PHONY: dsc
dsc:
rm -rf *.dsc $(BUILDDIR)
$(MAKE) $(DSC)
lintian $(DSC)
$(DSC): $(ORIG_SRC_TAR) $(BUILDDIR)
cd $(BUILDDIR); dpkg-buildpackage -S -us -uc -d
.PHONY: upload
upload: UPLOAD_DIST ?= $(DEB_DISTRIBUTION)
upload: $(DEBS)
tar cf - ${DEBS} | ssh repoman@repo.proxmox.com upload --product pve --dist bullseye
tar cf - $(DEBS) | ssh repoman@repo.proxmox.com upload --product pve --dist $(UPLOAD_DIST)
.PHONY: distclean clean
distclean: clean
clean:
rm -rf $(BUILDDIR) $(PACKAGE)*.deb *.buildinfo *.changes
rm -rf $(PACKAGE)-[0-9]*/ $(PACKAGE)*.tar* *.deb *.dsc *.build *.buildinfo *.changes
.PHONY: dinstall
dinstall: $(DEBS)

280
debian/changelog vendored
View File

@ -1,3 +1,283 @@
pve-qemu-kvm (9.0.0-2+vitastor1) bookworm; urgency=medium
* Add Vitastor support
-- Vitaliy Filippov <vitalif@yourcmc.ru> Mon, 20 May 2024 19:53:28 +0300
pve-qemu-kvm (9.0.0-2) bookworm; urgency=medium
* fix #5409: backup: fix copy-before-write timeout
* backup: improve error when copy-before-write fails for fleecing
* fix forwards and backwards migration with VirtIO-GPU display
* fix a regression in pflash device introduced in 8.2
* revert a commit for VirtIO PCI devices that turned out to cause more
potential security issues than what it fixed
* move compatibility flags for a new VirtIO-net feature to the correct
machine type. The feature was introduced in QEMU 8.2, but the
compatibility flags got added to machine version 8.0 instead of 8.1. This
breaks backwards migration with machine version 8.1 from a 8.2/9.0 binary
to an 8.1 binary, in cases where the guest kernel enables the feature
(e.g. Ubuntu 23.10).
While that breaks migration with machine version 8.1 from an unpatched to
a patched binary, Proxmox VE only ever had 8.2 on the test repository and
9.0 not yet in any public repository.
-- Proxmox Support Team <support@proxmox.com> Fri, 17 May 2024 17:04:52 +0200
pve-qemu-kvm (9.0.0-1) bookworm; urgency=medium
* update submodule and patches to QEMU 9.0.0
-- Proxmox Support Team <support@proxmox.com> Mon, 29 Apr 2024 10:51:37 +0200
pve-qemu-kvm (8.2.2-1) bookworm; urgency=medium
* update submodule and patches to QEMU 8.2.2
-- Proxmox Support Team <support@proxmox.com> Sat, 27 Apr 2024 12:44:30 +0200
pve-qemu-kvm (8.1.5-5) bookworm; urgency=medium
* implement support for backup fleecing
-- Proxmox Support Team <support@proxmox.com> Thu, 11 Apr 2024 17:46:48 +0200
pve-qemu-kvm (8.1.5-4) bookworm; urgency=medium
* fix live-import for certain kinds of VMDK images that rely on padding
* backup: avoid bubbling up first error if it's an ECANCELED one, as those
are often a result of cancling the job due to running into an actual
issue.
* backup: factor out & clean up gathering device info into helper
-- Proxmox Support Team <support@proxmox.com> Tue, 12 Mar 2024 14:08:40 +0100
pve-qemu-kvm (8.1.5-3) bookworm; urgency=medium
* backport fix for potential deadlock during QMP stop command if the VM has
disks attached through VirtIO-Block and IO-Thread enabled
* fix #4507: add patch to automatically increase NOFILE soft limit
-- Proxmox Support Team <support@proxmox.com> Wed, 21 Feb 2024 20:11:23 +0100
pve-qemu-kvm (8.1.5-2) bookworm; urgency=medium
* work around for a situation where guest IO might get stuck, if the VM is
configure with iothread and VirtIO block/SCSI
-- Proxmox Support Team <support@proxmox.com> Fri, 02 Feb 2024 19:41:27 +0100
pve-qemu-kvm (8.1.5-1) bookworm; urgency=medium
* update to 8.1.5 stable release, including more relevant fixes like:
- virtio-net: correctly copy vnet header when flushing TX
- hw/pflash: implement update buffer for block writes
- Fixes to i386 emulation and ARM emulation.
-- Proxmox Support Team <support@proxmox.com> Fri, 02 Feb 2024 19:08:13 +0100
pve-qemu-kvm (8.1.2-6) bookworm; urgency=medium
* revert attempted fix to avoid rare issue with stuck guest IO when using
iothread, because it caused a much more common issue with iothreads
consuming too much CPU
-- Proxmox Support Team <support@proxmox.com> Fri, 15 Dec 2023 14:22:06 +0100
pve-qemu-kvm (8.1.2-5) bookworm; urgency=medium
* backport workaround for stuck guest IO with iothread and VirtIO block/SCSI
in some rare edge cases
* backport fix for potential deadlock when issuing the "resize" QMP command
for a disk that is using iothread
-- Proxmox Support Team <support@proxmox.com> Mon, 11 Dec 2023 16:58:27 +0100
pve-qemu-kvm (8.1.2-4) bookworm; urgency=medium
* fix vnc clipboard in the host to guest direction
-- Proxmox Support Team <support@proxmox.com> Wed, 22 Nov 2023 14:28:21 +0100
pve-qemu-kvm (8.1.2-3) bookworm; urgency=medium
* fix #5054: backport fix for software reset with SATA, avoiding breakage
with, e.g., some FreeBSD VMs
-- Proxmox Support Team <support@proxmox.com> Mon, 20 Nov 2023 10:24:50 +0100
pve-qemu-kvm (8.1.2-2) bookworm; urgency=medium
* revert "x86: acpi: workaround Windows not handling name references in
Package properly" as that seems to have broken networking (and possibly
other things) one some localized variants of Windows (e.g., the German
versions).
-- Proxmox Support Team <support@proxmox.com> Fri, 17 Nov 2023 11:55:23 +0100
pve-qemu-kvm (8.1.2-1) bookworm; urgency=medium
* update submodule and patches to QEMU 8.1.2
* use QEMU's keycode-map-db again instead of our static copy from QEMU 6.0
* disable graph locking, newly introduced in the 8.1 release, as it has
still various deadlock issuess, e.g., during canceling backup jobs.
-- Proxmox Support Team <support@proxmox.com> Tue, 24 Oct 2023 13:42:45 +0200
pve-qemu-kvm (8.0.2-7) bookworm; urgency=medium
* fix #2874: SATA: avoid unsolicited write to sector 0 during reset
-- Proxmox Support Team <support@proxmox.com> Wed, 04 Oct 2023 08:33:35 +0200
pve-qemu-kvm (8.0.2-6) bookworm; urgency=medium
* fix #1534: vma: add extract-filter for disk images allowing users to pass
a comma separated list of the disks they want to extract from an archive.
* backup: create jobs in a drained section to avoid subtle bugs where
something interferes with the block-copy-state bitmap on initialization
* backup: drop experimental, and since a while also fully broken, directory
backup format (BACKUP_FORMAT_DIR). This format was never exposed via the
Proxmox VE API, but only available via QMP, as its broken since QEMU 8 and
we got zero reports about that, it's safe to assume that there are no
public users, so just remove it completely.
-- Proxmox Support Team <support@proxmox.com> Wed, 06 Sep 2023 17:03:59 +0200
pve-qemu-kvm (8.0.2-5) bookworm; urgency=medium
* improve memory footprint after backup by not keeping as much memory
resident.
* fix file descriptor leak for vhost (used by default by vNICs).
-- Proxmox Support Team <support@proxmox.com> Wed, 16 Aug 2023 11:52:24 +0200
pve-qemu-kvm (8.0.2-4) bookworm; urgency=medium
* fix resume for snapshot and hibernate in combination with iothread and
dirty bitmap
-- Proxmox Support Team <support@proxmox.com> Fri, 28 Jul 2023 12:58:22 +0200
pve-qemu-kvm (8.0.2-3) bookworm; urgency=medium
* fix regression in QEMU 8.0 for drive mirror with bitmap
-- Proxmox Support Team <support@proxmox.com> Thu, 15 Jun 2023 13:57:46 +0200
pve-qemu-kvm (8.0.2-2) bookworm; urgency=medium
* drop custom get_link_status QMP command, was never really used.
* drop custom & deprecated drive snapshot QMP commands, we use a better
alternative since a while.
-- Proxmox Support Team <support@proxmox.com> Fri, 09 Jun 2023 07:57:56 +0200
pve-qemu-kvm (8.0.2-1) bookworm; urgency=medium
* update to QEMU stable release 8.0.2
* update patches for avoiding issues with DMA reentrancy to current,
slightly optimized version.
-- Proxmox Support Team <support@proxmox.com> Tue, 06 Jun 2023 16:34:50 +0200
pve-qemu-kvm (8.0.0-1) bookworm; urgency=medium
* update to QEMU stable release 8.0.0
* re-build for Proxmox VE 8 / Debian 12 Bookworm
* adapt to the local virtiofsd C variant being dropped, it has been
rewritten in Rust and is now hosted in a separate source repository.
-- Proxmox Support Team <support@proxmox.com> Mon, 22 May 2023 13:45:49 +0200
pve-qemu-kvm (7.2.0-8) bullseye; urgency=medium
* backport fix for ACPI CPU hotplug issue with TCG
* cherry-pick TCG-related stable fixes for 7.2 for users that turned off KVM
HW acceleration
-- Proxmox Support Team <support@proxmox.com> Fri, 17 Mar 2023 15:47:08 +0100
pve-qemu-kvm (7.2.0-7) bullseye; urgency=medium
* improve fix for potential deadlock with trim for IDE/SATA and draining
* backport stable fixes:
- hw/nvme: fix missing endian conversions for doorbell buffers
- hw/smbios: fix field corruption in type 4 table
- virtio-rng-pci: fix transitional migration compat for vectors
- hw/timer/hpet: Fix expiration time overflow
- vhost/vdpa: stop all svq on device deletion
- vhost: avoid a potential use of an uninitialized variable in the call to
vhost_svq_poll
- chardev/char-socket: set s->listener = NULL in char_socket_finalize to
fix a potential crash after live-migration
- intel-iommu: fail MAP notifier without caching mode
- intel-iommu: fail DEVIOTLB_UNMAP without dt mode
* fix a regression for when the LSI SCSI controller is used
-- Proxmox Support Team <support@proxmox.com> Mon, 13 Mar 2023 17:42:49 +0100
pve-qemu-kvm (7.2.0-6) bullseye; urgency=medium
* fix 7.2 regression for Linux boot failures with megasas SCSI
* fix 7.0 regression for a potential deadlock with trim for IDE/SATA and
draining
-- Proxmox Support Team <support@proxmox.com> Wed, 08 Mar 2023 14:32:17 +0100
pve-qemu-kvm (7.2.0-5) bullseye; urgency=medium
* fix #4476: savevm-async: avoid looping without progress
* savevm-async: decrease the boundary for free space for (memory) state left
on target from 30 MiB to 100 MiB, improving the heuristic for when to
enter the final "pause and sync" stage.
* QMP backup: use correct error number when getting blockdrive length fails
* backport fix for some DMA reentrancy issues, better protecting against
malicious guests
* backport fix for iSCSI double free issue leading to crashes
-- Proxmox Support Team <support@proxmox.com> Tue, 21 Feb 2023 13:49:43 +0100
pve-qemu-kvm (7.2.0-4) bullseye; urgency=medium
* backport fix for a 7.2 regression when using VirtIO disk with
detect-zeroes=unmap
-- Proxmox Support Team <support@proxmox.com> Fri, 27 Jan 2023 09:37:49 +0100
pve-qemu-kvm (7.2.0-3) bullseye; urgency=medium
* add fix for live-migration with virtio-rng devices, which regressed in
QEMU 7.2.0.
-- Proxmox Support Team <support@proxmox.com> Thu, 12 Jan 2023 13:13:14 +0100
pve-qemu-kvm (7.2.0-2) bullseye; urgency=medium
* enable slirp again for now, as in qemu-server, user networking is

1
debian/compat vendored
View File

@ -1 +0,0 @@
10

26
debian/control vendored
View File

@ -2,9 +2,8 @@ Source: pve-qemu-kvm
Section: admin
Priority: optional
Maintainer: Proxmox Support Team <support@proxmox.com>
Build-Depends: autotools-dev,
Build-Depends: debhelper-compat (= 13),
check,
debhelper (>= 9),
libacl1-dev,
libaio-dev,
libattr1-dev,
@ -21,7 +20,7 @@ Build-Depends: autotools-dev,
libnuma-dev,
libpci-dev,
libpixman-1-dev,
libproxmox-backup-qemu0-dev (>= 1.3.0-1),
libproxmox-backup-qemu0-dev (>= 1.3.0),
librbd-dev (>= 0.48),
libsdl1.2-dev,
libseccomp-dev,
@ -30,7 +29,7 @@ Build-Depends: autotools-dev,
libspice-server-dev (>= 0.14.0~),
libsystemd-dev,
liburing-dev,
libusb-1.0-0-dev (>= 1.0.17-1),
libusb-1.0-0-dev (>= 1.0.17),
libusbredirparser-dev (>= 0.6-2),
libvirglrenderer-dev,
libzstd-dev,
@ -38,9 +37,8 @@ Build-Depends: autotools-dev,
python3-minimal,
python3-sphinx,
python3-sphinx-rtd-theme,
python3-venv,
quilt,
texi2html,
texinfo,
uuid-dev,
xfslibs-dev,
Standards-Version: 3.7.2
@ -61,11 +59,12 @@ Depends: ceph-common (>= 0.48),
libspice-server1 (>= 0.14.0~),
libusb-1.0-0 (>= 1.0.17-1),
libusbredirparser1 (>= 0.6-2),
vitastor-client (>= 0.9.4),
libuuid1,
${misc:Depends},
${shlibs:Depends},
Recommends: numactl
Suggests: libgl1
Recommends: numactl,
Suggests: libgl1,
Conflicts: kvm,
pve-kvm,
pve-qemu-kvm-2.6.18,
@ -73,22 +72,17 @@ Conflicts: kvm,
qemu-kvm,
qemu-system-arm,
qemu-system-common,
qemu-system-data,
qemu-system-x86,
qemu-utils,
Provides: qemu-system-arm, qemu-system-x86, qemu-utils
Provides: qemu-system-arm, qemu-system-x86, qemu-utils,
Replaces: pve-kvm,
pve-qemu-kvm-2.6.18,
qemu-system-arm,
qemu-system-x86,
qemu-utils,
Breaks: qemu-server (<= 8.0.6)
Description: Full virtualization on x86 hardware
Using KVM, one can run multiple virtual PCs, each running unmodified Linux or
Windows images. Each virtual machine has private virtualized hardware: a
network card, disk, graphics adapter, etc.
Package: pve-qemu-kvm-dbg
Architecture: any
Section: debug
Depends: pve-qemu-kvm (= ${binary:Version})
Description: pve qemu debugging symbols
This package contains the debugging symbols for pve-qemu-kvm.

View File

@ -24,4 +24,5 @@ while (<STDIN>) {
die "no QEMU machine types detected from STDIN input" if scalar (@$machines) <= 0;
print to_json($machines, { utf8 => 1 }) or die "$!\n";
print to_json($machines, { utf8 => 1, canonical => 1 })
or die "failed to encode detected machines as JSON - $!\n";

View File

@ -27,16 +27,18 @@ Signed-off-by: Ma Haocong <mahaocong@didichuxing.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: rebased for 8.2.2]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/mirror.c | 98 +++++++++++++++++++++-----
blockdev.c | 39 +++++++++-
block/mirror.c | 99 ++++++++++++++++++++------
blockdev.c | 38 +++++++++-
include/block/block_int-global-state.h | 4 +-
qapi/block-core.json | 29 ++++++--
qapi/block-core.json | 25 ++++++-
tests/unit/test-block-iothread.c | 4 +-
5 files changed, 145 insertions(+), 29 deletions(-)
5 files changed, 142 insertions(+), 28 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 251adc5ae0..8ead5f77a0 100644
index 1bdce3b657..0c5c72df2e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -51,7 +51,7 @@ typedef struct MirrorBlockJob {
@ -48,7 +50,7 @@ index 251adc5ae0..8ead5f77a0 100644
BlockMirrorBackingMode backing_mode;
/* Whether the target image requires explicit zero-initialization */
bool zero_target;
@@ -65,6 +65,8 @@ typedef struct MirrorBlockJob {
@@ -73,6 +73,8 @@ typedef struct MirrorBlockJob {
size_t buf_size;
int64_t bdev_length;
unsigned long *cow_bitmap;
@ -57,9 +59,9 @@ index 251adc5ae0..8ead5f77a0 100644
BdrvDirtyBitmap *dirty_bitmap;
BdrvDirtyBitmapIter *dbi;
uint8_t *buf;
@@ -699,7 +701,8 @@ static int mirror_exit_common(Job *job)
bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
@@ -722,7 +724,8 @@ static int mirror_exit_common(Job *job)
&error_abort);
if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
- BlockDriverState *backing = s->is_none_mode ? src : s->base;
+ BlockDriverState *backing;
@ -67,7 +69,7 @@ index 251adc5ae0..8ead5f77a0 100644
BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
if (bdrv_cow_bs(unfiltered_target) != backing) {
@@ -797,6 +800,16 @@ static void mirror_abort(Job *job)
@@ -819,6 +822,16 @@ static void mirror_abort(Job *job)
assert(ret == 0);
}
@ -84,7 +86,7 @@ index 251adc5ae0..8ead5f77a0 100644
static void coroutine_fn mirror_throttle(MirrorBlockJob *s)
{
int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -977,7 +990,8 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
@@ -1015,7 +1028,8 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
mirror_free_init(s);
s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@ -94,7 +96,7 @@ index 251adc5ae0..8ead5f77a0 100644
ret = mirror_dirty_init(s);
if (ret < 0 || job_is_cancelled(&s->common.job)) {
goto immediate_exit;
@@ -1224,6 +1238,7 @@ static const BlockJobDriver mirror_job_driver = {
@@ -1304,6 +1318,7 @@ static const BlockJobDriver mirror_job_driver = {
.run = mirror_run,
.prepare = mirror_prepare,
.abort = mirror_abort,
@ -102,7 +104,7 @@ index 251adc5ae0..8ead5f77a0 100644
.pause = mirror_pause,
.complete = mirror_complete,
.cancel = mirror_cancel,
@@ -1240,6 +1255,7 @@ static const BlockJobDriver commit_active_job_driver = {
@@ -1322,6 +1337,7 @@ static const BlockJobDriver commit_active_job_driver = {
.run = mirror_run,
.prepare = mirror_prepare,
.abort = mirror_abort,
@ -110,7 +112,7 @@ index 251adc5ae0..8ead5f77a0 100644
.pause = mirror_pause,
.complete = mirror_complete,
.cancel = commit_active_cancel,
@@ -1627,7 +1643,10 @@ static BlockJob *mirror_start_job(
@@ -1714,7 +1730,10 @@ static BlockJob *mirror_start_job(
BlockCompletionFunc *cb,
void *opaque,
const BlockJobDriver *driver,
@ -122,9 +124,9 @@ index 251adc5ae0..8ead5f77a0 100644
bool auto_complete, const char *filter_node_name,
bool is_mirror, MirrorCopyMode copy_mode,
Error **errp)
@@ -1639,10 +1658,39 @@ static BlockJob *mirror_start_job(
uint64_t target_perms, target_shared_perms;
int ret;
@@ -1728,10 +1747,39 @@ static BlockJob *mirror_start_job(
GLOBAL_STATE_CODE();
- if (granularity == 0) {
- granularity = bdrv_get_default_bitmap_granularity(target);
@ -164,7 +166,7 @@ index 251adc5ae0..8ead5f77a0 100644
assert(is_power_of_2(granularity));
if (buf_size < 0) {
@@ -1774,7 +1822,9 @@ static BlockJob *mirror_start_job(
@@ -1871,7 +1919,9 @@ static BlockJob *mirror_start_job(
s->replaces = g_strdup(replaces);
s->on_source_error = on_source_error;
s->on_target_error = on_target_error;
@ -174,10 +176,10 @@ index 251adc5ae0..8ead5f77a0 100644
+ s->bitmap_mode = bitmap_mode;
s->backing_mode = backing_mode;
s->zero_target = zero_target;
s->copy_mode = copy_mode;
@@ -1795,6 +1845,18 @@ static BlockJob *mirror_start_job(
bdrv_disable_dirty_bitmap(s->dirty_bitmap);
}
qatomic_set(&s->copy_mode, copy_mode);
@@ -1897,6 +1947,18 @@ static BlockJob *mirror_start_job(
*/
bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+ if (s->sync_bitmap) {
+ bdrv_dirty_bitmap_set_busy(s->sync_bitmap, true);
@ -191,10 +193,10 @@ index 251adc5ae0..8ead5f77a0 100644
+ }
+ }
+
bdrv_graph_wrlock();
ret = block_job_add_bdrv(&s->common, "source", bs, 0,
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
BLK_PERM_CONSISTENT_READ,
@@ -1872,6 +1934,9 @@ fail:
@@ -1979,6 +2041,9 @@ fail:
if (s->dirty_bitmap) {
bdrv_release_dirty_bitmap(s->dirty_bitmap);
}
@ -204,7 +206,7 @@ index 251adc5ae0..8ead5f77a0 100644
job_early_fail(&s->common.job);
}
@@ -1889,31 +1954,25 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
@@ -2001,35 +2066,28 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, const char *replaces,
int creation_flags, int64_t speed,
uint32_t granularity, int64_t buf_size,
@ -229,8 +231,12 @@ index 251adc5ae0..8ead5f77a0 100644
- MirrorSyncMode_str(mode));
- return;
- }
-
bdrv_graph_rdlock_main_loop();
- is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
bdrv_graph_rdunlock_main_loop();
mirror_start_job(job_id, bs, creation_flags, target, replaces,
speed, granularity, buf_size, backing_mode, zero_target,
on_source_error, on_target_error, unmap, NULL, NULL,
@ -241,7 +247,7 @@ index 251adc5ae0..8ead5f77a0 100644
}
BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -1940,7 +1999,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -2056,7 +2114,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
job_id, bs, creation_flags, base, NULL, speed, 0, 0,
MIRROR_LEAVE_BACKING_CHAIN, false,
on_error, on_error, true, cb, opaque,
@ -252,33 +258,32 @@ index 251adc5ae0..8ead5f77a0 100644
errp);
if (!job) {
diff --git a/blockdev.c b/blockdev.c
index 3f1dec6242..2ee30323cb 100644
index 057601dcf0..8682814a7a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2946,6 +2946,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2776,6 +2776,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
BlockDriverState *target,
bool has_replaces, const char *replaces,
const char *replaces,
enum MirrorSyncMode sync,
+ bool has_bitmap,
+ const char *bitmap_name,
+ bool has_bitmap_mode,
+ BitmapSyncMode bitmap_mode,
BlockMirrorBackingMode backing_mode,
bool zero_target,
bool has_speed, int64_t speed,
@@ -2965,6 +2969,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2794,6 +2797,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
{
BlockDriverState *unfiltered_bs;
int job_flags = JOB_DEFAULT;
+ BdrvDirtyBitmap *bitmap = NULL;
if (!has_speed) {
speed = 0;
@@ -3019,6 +3024,29 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
GLOBAL_STATE_CODE();
GRAPH_RDLOCK_GUARD_MAINLOOP();
@@ -2848,6 +2852,29 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
sync = MIRROR_SYNC_MODE_FULL;
}
+ if (has_bitmap) {
+ if (bitmap_name) {
+ if (granularity) {
+ error_setg(errp, "Granularity and bitmap cannot both be set");
+ return;
@ -301,53 +306,53 @@ index 3f1dec6242..2ee30323cb 100644
+ }
+ }
+
if (!has_replaces) {
if (!replaces) {
/* We want to mirror from @bs, but keep implicit filters on top */
unfiltered_bs = bdrv_skip_implicit_filters(bs);
@@ -3065,8 +3093,8 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2889,8 +2916,8 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
* and will allow to check whether the node still exist at mirror completion
*/
mirror_start(job_id, bs, target,
- has_replaces ? replaces : NULL, job_flags,
- replaces, job_flags,
- speed, granularity, buf_size, sync, backing_mode, zero_target,
+ has_replaces ? replaces : NULL, job_flags, speed, granularity,
+ buf_size, sync, bitmap, bitmap_mode, backing_mode, zero_target,
+ replaces, job_flags, speed, granularity, buf_size, sync,
+ bitmap, bitmap_mode, backing_mode, zero_target,
on_source_error, on_target_error, unmap, filter_node_name,
copy_mode, errp);
}
@@ -3211,6 +3239,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
@@ -3034,6 +3061,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
arg->has_replaces, arg->replaces, arg->sync,
+ arg->has_bitmap, arg->bitmap,
blockdev_mirror_common(arg->job_id, bs, target_bs,
arg->replaces, arg->sync,
+ arg->bitmap,
+ arg->has_bitmap_mode, arg->bitmap_mode,
backing_mode, zero_target,
arg->has_speed, arg->speed,
arg->has_granularity, arg->granularity,
@@ -3232,6 +3262,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
@@ -3053,6 +3082,8 @@ void qmp_blockdev_mirror(const char *job_id,
const char *device, const char *target,
bool has_replaces, const char *replaces,
const char *replaces,
MirrorSyncMode sync,
+ bool has_bitmap, const char *bitmap,
+ const char *bitmap,
+ bool has_bitmap_mode, BitmapSyncMode bitmap_mode,
bool has_speed, int64_t speed,
bool has_granularity, uint32_t granularity,
bool has_buf_size, int64_t buf_size,
@@ -3281,7 +3313,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
@@ -3093,7 +3124,8 @@ void qmp_blockdev_mirror(const char *job_id,
}
blockdev_mirror_common(has_job_id ? job_id : NULL, bs, target_bs,
- has_replaces, replaces, sync, backing_mode,
+ has_replaces, replaces, sync, has_bitmap,
blockdev_mirror_common(job_id, bs, target_bs,
- replaces, sync, backing_mode,
+ replaces, sync,
+ bitmap, has_bitmap_mode, bitmap_mode, backing_mode,
zero_target, has_speed, speed,
has_granularity, granularity,
has_buf_size, buf_size,
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index b49f4eb35b..9d744db618 100644
index d2201e27f4..cc1387ae02 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -149,7 +149,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
@@ -158,7 +158,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, const char *replaces,
int creation_flags, int64_t speed,
uint32_t granularity, int64_t buf_size,
@ -359,31 +364,26 @@ index b49f4eb35b..9d744db618 100644
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 95ac4fa634..7daaf545be 100644
index 746d1694c2..45ab548dfe 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2000,10 +2000,19 @@
# (all the disk, only the sectors allocated in the topmost image, or
# only new I/O).
@@ -2174,6 +2174,15 @@
# destination (all the disk, only the sectors allocated in the
# topmost image, or only new I/O).
#
+# @bitmap: The name of a bitmap to use for sync=bitmap mode. This argument must
+# be present for bitmap mode and absent otherwise. The bitmap's
+# granularity is used instead of @granularity (since 4.1).
+# @bitmap: The name of a bitmap to use for sync=bitmap mode. This
+# argument must be present for bitmap mode and absent otherwise.
+# The bitmap's granularity is used instead of @granularity (Since
+# 4.1).
+#
+# @bitmap-mode: Specifies the type of data the bitmap should contain after
+# the operation concludes. Must be present if sync is "bitmap".
+# Must NOT be present otherwise. (Since 4.1)
+# @bitmap-mode: Specifies the type of data the bitmap should contain
+# after the operation concludes. Must be present if sync is
+# "bitmap". Must NOT be present otherwise. (Since 4.1)
+#
# @granularity: granularity of the dirty bitmap, default is 64K
# if the image format doesn't have clusters, 4K if the clusters
# are smaller than that, else the cluster size. Must be a
-# power of 2 between 512 and 64M (since 1.4).
+# power of 2 between 512 and 64M. Must not be specified if
+# @bitmap is present (since 1.4).
#
# @buf-size: maximum amount of data in flight from source to
# target (since 1.4).
@@ -2043,7 +2052,9 @@
# @granularity: granularity of the dirty bitmap, default is 64K if the
# image format doesn't have clusters, 4K if the clusters are
# smaller than that, else the cluster size. Must be a power of 2
@@ -2216,7 +2225,9 @@
{ 'struct': 'DriveMirror',
'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
'*format': 'str', '*node-name': 'str', '*replaces': 'str',
@ -394,28 +394,23 @@ index 95ac4fa634..7daaf545be 100644
'*speed': 'int', '*granularity': 'uint32',
'*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
'*on-target-error': 'BlockdevOnError',
@@ -2322,10 +2333,19 @@
# (all the disk, only the sectors allocated in the topmost image, or
# only new I/O).
@@ -2496,6 +2507,15 @@
# destination (all the disk, only the sectors allocated in the
# topmost image, or only new I/O).
#
+# @bitmap: The name of a bitmap to use for sync=bitmap mode. This argument must
+# be present for bitmap mode and absent otherwise. The bitmap's
+# granularity is used instead of @granularity (since 4.1).
+# @bitmap: The name of a bitmap to use for sync=bitmap mode. This
+# argument must be present for bitmap mode and absent otherwise.
+# The bitmap's granularity is used instead of @granularity (since
+# 4.1).
+#
+# @bitmap-mode: Specifies the type of data the bitmap should contain after
+# the operation concludes. Must be present if sync is "bitmap".
+# Must NOT be present otherwise. (Since 4.1)
+# @bitmap-mode: Specifies the type of data the bitmap should contain
+# after the operation concludes. Must be present if sync is
+# "bitmap". Must NOT be present otherwise. (Since 4.1)
+#
# @granularity: granularity of the dirty bitmap, default is 64K
# if the image format doesn't have clusters, 4K if the clusters
# are smaller than that, else the cluster size. Must be a
-# power of 2 between 512 and 64M
+# power of 2 between 512 and 64M . Must not be specified if
+# @bitmap is present.
#
# @buf-size: maximum amount of data in flight from source to
# target
@@ -2375,7 +2395,8 @@
# @granularity: granularity of the dirty bitmap, default is 64K if the
# image format doesn't have clusters, 4K if the clusters are
# smaller than that, else the cluster size. Must be a power of 2
@@ -2544,7 +2564,8 @@
{ 'command': 'blockdev-mirror',
'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
'*replaces': 'str',
@ -426,7 +421,7 @@ index 95ac4fa634..7daaf545be 100644
'*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
'*on-target-error': 'BlockdevOnError',
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 8ca5adec5e..dae80e5a5f 100644
index 3766d5de6b..afa44cbd34 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -755,8 +755,8 @@ static void test_propagate_mirror(void)
@ -439,4 +434,4 @@ index 8ca5adec5e..dae80e5a5f 100644
+ false, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
&error_abort);
WITH_JOB_LOCK_GUARD() {

View File

@ -24,10 +24,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 8ead5f77a0..35c1b8f25d 100644
index 0c5c72df2e..37fee3fa25 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -676,8 +676,6 @@ static int mirror_exit_common(Job *job)
@@ -693,8 +693,6 @@ static int mirror_exit_common(Job *job)
bdrv_unfreeze_backing_chain(mirror_top_bs, target_bs);
}
@ -36,9 +36,9 @@ index 8ead5f77a0..35c1b8f25d 100644
/* Make sure that the source BDS doesn't go away during bdrv_replace_node,
* before we can call bdrv_drained_end */
bdrv_ref(src);
@@ -778,6 +776,18 @@ static int mirror_exit_common(Job *job)
block_job_remove_all_bdrv(bjob);
bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
@@ -800,6 +798,18 @@ static int mirror_exit_common(Job *job)
bdrv_drained_end(target_bs);
bdrv_unref(target_bs);
+ if (s->sync_bitmap) {
+ if (s->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS ||
@ -55,7 +55,7 @@ index 8ead5f77a0..35c1b8f25d 100644
bs_opaque->job = NULL;
bdrv_drained_end(src);
@@ -1668,10 +1678,6 @@ static BlockJob *mirror_start_job(
@@ -1757,10 +1767,6 @@ static BlockJob *mirror_start_job(
" sync mode",
MirrorSyncMode_str(sync_mode));
return NULL;
@ -66,7 +66,7 @@ index 8ead5f77a0..35c1b8f25d 100644
}
} else if (bitmap) {
error_setg(errp,
@@ -1688,6 +1694,12 @@ static BlockJob *mirror_start_job(
@@ -1777,6 +1783,12 @@ static BlockJob *mirror_start_job(
return NULL;
}
granularity = bdrv_dirty_bitmap_granularity(bitmap);

View File

@ -16,10 +16,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 3 insertions(+)
diff --git a/blockdev.c b/blockdev.c
index 2ee30323cb..dd1c2cdef7 100644
index 8682814a7a..5b75a085ee 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3045,6 +3045,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2873,6 +2873,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_ALLOW_RO, errp)) {
return;
}
@ -28,4 +28,4 @@ index 2ee30323cb..dd1c2cdef7 100644
+ return;
}
if (!has_replaces) {
if (!replaces) {

View File

@ -16,10 +16,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 35c1b8f25d..4969c6833c 100644
index 37fee3fa25..6b3cce1007 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -782,8 +782,8 @@ static int mirror_exit_common(Job *job)
@@ -804,8 +804,8 @@ static int mirror_exit_common(Job *job)
job->ret == 0 && ret == 0)) {
/* Success; synchronize copy back to sync. */
bdrv_clear_dirty_bitmap(s->sync_bitmap, NULL);
@ -30,7 +30,7 @@ index 35c1b8f25d..4969c6833c 100644
}
}
bdrv_release_dirty_bitmap(s->dirty_bitmap);
@@ -1862,11 +1862,8 @@ static BlockJob *mirror_start_job(
@@ -1964,11 +1964,8 @@ static BlockJob *mirror_start_job(
}
if (s->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
@ -43,4 +43,4 @@ index 35c1b8f25d..4969c6833c 100644
+ NULL, true);
}
ret = block_job_add_bdrv(&s->common, "source", bs, 0,
bdrv_graph_wrlock();

View File

@ -12,6 +12,8 @@ uniform w.r.t. backup block jobs.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: rebase for 8.2.2]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/mirror.c | 28 +++------------
blockdev.c | 29 +++++++++++++++
@ -19,12 +21,12 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
3 files changed, 70 insertions(+), 59 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 4969c6833c..cf85ae1074 100644
index 6b3cce1007..2f1223852b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1668,31 +1668,13 @@ static BlockJob *mirror_start_job(
uint64_t target_perms, target_shared_perms;
int ret;
@@ -1757,31 +1757,13 @@ static BlockJob *mirror_start_job(
GLOBAL_STATE_CODE();
- if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
- error_setg(errp, "Sync mode '%s' not supported",
@ -60,17 +62,17 @@ index 4969c6833c..cf85ae1074 100644
if (bitmap_mode != BITMAP_SYNC_MODE_NEVER) {
diff --git a/blockdev.c b/blockdev.c
index dd1c2cdef7..756e980889 100644
index 5b75a085ee..d27d8c38ec 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3024,7 +3024,36 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2852,7 +2852,36 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
sync = MIRROR_SYNC_MODE_FULL;
}
+ if ((sync == MIRROR_SYNC_MODE_BITMAP) ||
+ (sync == MIRROR_SYNC_MODE_INCREMENTAL)) {
+ /* done before desugaring 'incremental' to print the right message */
+ if (!has_bitmap) {
+ if (!bitmap_name) {
+ error_setg(errp, "Must provide a valid bitmap name for "
+ "'%s' sync mode", MirrorSyncMode_str(sync));
+ return;
@ -91,7 +93,7 @@ index dd1c2cdef7..756e980889 100644
+ bitmap_mode = BITMAP_SYNC_MODE_ON_SUCCESS;
+ }
+
if (has_bitmap) {
if (bitmap_name) {
+ if (sync != MIRROR_SYNC_MODE_BITMAP) {
+ error_setg(errp, "Sync mode '%s' not supported with bitmap.",
+ MirrorSyncMode_str(sync));

View File

@ -48,7 +48,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
6 files changed, 59 insertions(+), 5 deletions(-)
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 737e750670..38804b8595 100644
index 965f5d5450..e04bd059b6 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -16,6 +16,7 @@ extern QemuOptsList qemu_mon_opts;
@ -60,10 +60,10 @@ index 737e750670..38804b8595 100644
void monitor_init_globals(void);
void monitor_init_globals_core(void);
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index a2cdbbf646..b531bd50e7 100644
index 252de85681..8db28f9272 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -152,6 +152,13 @@ typedef struct {
@@ -151,6 +151,13 @@ typedef struct {
QemuMutex qmp_queue_lock;
/* Input queue that holds all the parsed QMP requests */
GQueue *qmp_requests;
@ -78,10 +78,10 @@ index a2cdbbf646..b531bd50e7 100644
/**
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 86949024f6..c306cadcf4 100644
index 01ede1babd..5681bca346 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -135,6 +135,21 @@ bool monitor_cur_is_qmp(void)
@@ -117,6 +117,21 @@ bool monitor_cur_is_qmp(void)
return cur_mon && monitor_is_qmp(cur_mon);
}
@ -104,10 +104,10 @@ index 86949024f6..c306cadcf4 100644
* Is @mon is using readline?
* Note: not all HMP monitors use readline, e.g., gdbserver has a
diff --git a/monitor/qmp.c b/monitor/qmp.c
index 092c527b6f..6b8cfcf6d8 100644
index a239945e8d..589c9524f8 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -141,6 +141,8 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
@@ -165,6 +165,8 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
QDict *rsp;
QDict *error;
@ -116,7 +116,7 @@ index 092c527b6f..6b8cfcf6d8 100644
rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
&mon->common);
@@ -156,7 +158,17 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
@@ -180,7 +182,17 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
}
}
@ -135,7 +135,7 @@ index 092c527b6f..6b8cfcf6d8 100644
qobject_unref(rsp);
}
@@ -444,6 +456,7 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent event)
@@ -461,6 +473,7 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent event)
switch (event) {
case CHR_EVENT_OPENED:
@ -144,7 +144,7 @@ index 092c527b6f..6b8cfcf6d8 100644
monitor_qmp_caps_reset(mon);
data = qmp_greeting(mon);
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 0990873ec8..e605003771 100644
index f3488afeef..2624eb3470 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -117,16 +117,28 @@ typedef struct QmpDispatchBH {
@ -180,13 +180,13 @@ index 0990873ec8..e605003771 100644
aio_co_wake(data->co);
}
@@ -231,6 +243,7 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
@@ -250,6 +262,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
.ret = &ret,
.errp = &err,
.co = qemu_coroutine_self(),
+ .conn_nr = monitor_get_connection_nr(cur_mon),
};
aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
aio_bh_schedule_oneshot(iohandler_get_aio_context(), do_qmp_dispatch_bh,
&data);
diff --git a/stubs/monitor-core.c b/stubs/monitor-core.c
index afa477aae6..d3ff124bf3 100644

View File

@ -1,42 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Fri, 28 Oct 2022 10:09:46 +0200
Subject: [PATCH] init: daemonize: defuse PID file resolve error
When proxmox-file-restore invokes QEMU, the PID file is a (temporary)
file that's already unlinked, so resolving the absolute path here
failed.
It should not be a critical error when the PID file unlink handler
can't be registered, because the path can't be resolved for whatever
reason. If the file is already gone from QEMU's perspective (i.e.
errno is ENOENT), silently ignore the error. Otherwise, print a
warning.
Reported-by: Dominik Csapak <d.csapak@proxmox.com>
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
softmmu/vl.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 5115221efe..5f7f6ca981 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2460,10 +2460,11 @@ static void qemu_maybe_daemonize(const char *pid_file)
pid_file_realpath = g_malloc0(PATH_MAX);
if (!realpath(pid_file, pid_file_realpath)) {
- error_report("cannot resolve PID file path: %s: %s",
- pid_file, strerror(errno));
- unlink(pid_file);
- exit(1);
+ if (errno != ENOENT) {
+ warn_report("not removing PID file on exit: cannot resolve PID "
+ "file path: %s: %s", pid_file, strerror(errno));
+ }
+ return;
}
qemu_unlink_pidfile_notifier = (struct UnlinkPidfileNotifier) {

View File

@ -0,0 +1,69 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Tue, 28 Feb 2023 09:11:29 -0800
Subject: [PATCH] scsi: megasas: Internal cdbs have 16-byte length
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Host drivers do not necessarily set cdb_len in megasas io commands.
With commits 6d1511cea0 ("scsi: Reject commands if the CDB length
exceeds buf_len") and fe9d8927e2 ("scsi: Add buf_len parameter to
scsi_req_new()"), this results in failures to boot Linux from affected
SCSI drives because cdb_len is set to 0 by the host driver.
Set the cdb length to its actual size to solve the problem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(picked-up from https://lists.nongnu.org/archive/html/qemu-devel/2023-02/msg08653.html)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/scsi/megasas.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 2d0c607177..97e51733af 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -1781,7 +1781,7 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd *cmd, int frame_cmd)
uint8_t cdb[16];
int len;
struct SCSIDevice *sdev = NULL;
- int target_id, lun_id, cdb_len;
+ int target_id, lun_id;
lba_count = le32_to_cpu(cmd->frame->io.header.data_len);
lba_start_lo = le32_to_cpu(cmd->frame->io.lba_lo);
@@ -1790,7 +1790,6 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd *cmd, int frame_cmd)
target_id = cmd->frame->header.target_id;
lun_id = cmd->frame->header.lun_id;
- cdb_len = cmd->frame->header.cdb_len;
if (target_id < MFI_MAX_LD && lun_id == 0) {
sdev = scsi_device_find(&s->bus, 0, target_id, lun_id);
@@ -1805,15 +1804,6 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd *cmd, int frame_cmd)
return MFI_STAT_DEVICE_NOT_FOUND;
}
- if (cdb_len > 16) {
- trace_megasas_scsi_invalid_cdb_len(
- mfi_frame_desc(frame_cmd), 1, target_id, lun_id, cdb_len);
- megasas_write_sense(cmd, SENSE_CODE(INVALID_OPCODE));
- cmd->frame->header.scsi_status = CHECK_CONDITION;
- s->event_count++;
- return MFI_STAT_SCSI_DONE_WITH_ERROR;
- }
-
cmd->iov_size = lba_count * sdev->blocksize;
if (megasas_map_sgl(s, cmd, &cmd->frame->io.sgl)) {
megasas_write_sense(cmd, SENSE_CODE(TARGET_FAILURE));
@@ -1824,7 +1814,7 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd *cmd, int frame_cmd)
megasas_encode_lba(cdb, lba_start, lba_count, is_write);
cmd->req = scsi_req_new(sdev, cmd->index,
- lun_id, cdb, cdb_len, cmd);
+ lun_id, cdb, sizeof(cdb), cmd);
if (!cmd->req) {
trace_megasas_scsi_req_alloc_failed(
mfi_frame_desc(frame_cmd), target_id, lun_id);

View File

@ -0,0 +1,100 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Tue, 7 Mar 2023 15:03:02 +0100
Subject: [PATCH] ide: avoid potential deadlock when draining during trim
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The deadlock can happen as follows:
1. ide_issue_trim is called, and increments the in_flight counter.
2. ide_issue_trim_cb calls blk_aio_pdiscard.
3. Somebody else starts draining (e.g. backup to insert the cbw node).
4. ide_issue_trim_cb is called as the completion callback for
blk_aio_pdiscard.
5. ide_issue_trim_cb issues yet another blk_aio_pdiscard request.
6. The request is added to the wait queue via blk_wait_while_drained,
because draining has been started.
7. Nobody ever decrements the in_flight counter and draining can't
finish. This would be done by ide_trim_bh_cb, which is called after
ide_issue_trim_cb has issued its last request, but
ide_issue_trim_cb is not called anymore, because it's the
completion callback of blk_aio_pdiscard, which waits on draining.
Quoting Hanna Czenczek:
> The point of 7e5cdb345f was that we need any in-flight count to
> accompany a set s->bus->dma->aiocb. While blk_aio_pdiscard() is
> happening, we dont necessarily need another count. But we do need
> it while there is no blk_aio_pdiscard().
> ide_issue_trim_cb() returns in two cases (and, recursively through
> its callers, leaves s->bus->dma->aiocb set):
> 1. After calling blk_aio_pdiscard(), which will keep an in-flight
> count,
> 2. After calling replay_bh_schedule_event() (i.e.
> qemu_bh_schedule()), which does not keep an in-flight count.
Thus, even after moving the blk_inc_in_flight to above the
replay_bh_schedule_event call, the invariant "ide_issue_trim_cb
returns with an accompanying in-flight count" is still satisfied.
However, the issue 7e5cdb345f fixed for canceling resurfaces, because
ide_cancel_dma_sync assumes that it just needs to drain once. But now
the in_flight count is not consistently > 0 during the trim operation.
So, change it to drain until !s->bus->dma->aiocb, which means that the
operation finished (s->bus->dma->aiocb is cleared by ide_set_inactive
via the ide_dma_cb when the end of the transfer is reached).
Discussion here:
https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg02506.html
Fixes: 7e5cdb345f ("ide: Increment BB in-flight counter for TRIM BH")
Suggested-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/ide/core.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/hw/ide/core.c b/hw/ide/core.c
index e8cb2dac92..3b21acf651 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -456,7 +456,7 @@ static void ide_trim_bh_cb(void *opaque)
iocb->bh = NULL;
qemu_aio_unref(iocb);
- /* Paired with an increment in ide_issue_trim() */
+ /* Paired with an increment in ide_issue_trim_cb() */
blk_dec_in_flight(blk);
}
@@ -516,6 +516,8 @@ static void ide_issue_trim_cb(void *opaque, int ret)
done:
iocb->aiocb = NULL;
if (iocb->bh) {
+ /* Paired with a decrement in ide_trim_bh_cb() */
+ blk_inc_in_flight(s->blk);
replay_bh_schedule_event(iocb->bh);
}
}
@@ -528,9 +530,6 @@ BlockAIOCB *ide_issue_trim(
IDEDevice *dev = s->unit ? s->bus->slave : s->bus->master;
TrimAIOCB *iocb;
- /* Paired with a decrement in ide_trim_bh_cb() */
- blk_inc_in_flight(s->blk);
-
iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque);
iocb->s = s;
iocb->bh = qemu_bh_new_guarded(ide_trim_bh_cb, iocb,
@@ -754,8 +753,9 @@ void ide_cancel_dma_sync(IDEState *s)
*/
if (s->bus->dma->aiocb) {
trace_ide_cancel_dma_sync_remaining();
- blk_drain(s->blk);
- assert(s->bus->dma->aiocb == NULL);
+ while (s->bus->dma->aiocb) {
+ blk_drain(s->blk);
+ }
}
}

View File

@ -1,44 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Chenyi Qiang <chenyi.qiang@intel.com>
Date: Fri, 16 Dec 2022 14:22:31 +0800
Subject: [PATCH] virtio-mem: Fix the bitmap index of the section offset
vmem->bitmap indexes the memory region of the virtio-mem backend at a
granularity of block_size. To calculate the index of target section offset,
the block_size should be divided instead of the bitmap_size.
Fixes: 2044969f0b ("virtio-mem: Implement RamDiscardManager interface")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20221216062231.11181-1-chenyi.qiang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: David Hildenbrand <david@redhat.com>
(cherry-picked from commit b11cf32e07a2f7ff0d171b89497381a04c9d07e0)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/virtio/virtio-mem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index ed170def48..e19ee817fe 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -235,7 +235,7 @@ static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem,
uint64_t offset, size;
int ret = 0;
- first_bit = s->offset_within_region / vmem->bitmap_size;
+ first_bit = s->offset_within_region / vmem->block_size;
first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, first_bit);
while (first_bit < vmem->bitmap_size) {
MemoryRegionSection tmp = *s;
@@ -267,7 +267,7 @@ static int virtio_mem_for_each_unplugged_section(const VirtIOMEM *vmem,
uint64_t offset, size;
int ret = 0;
- first_bit = s->offset_within_region / vmem->bitmap_size;
+ first_bit = s->offset_within_region / vmem->block_size;
first_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, first_bit);
while (first_bit < vmem->bitmap_size) {
MemoryRegionSection tmp = *s;

View File

@ -0,0 +1,45 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Fri, 17 Nov 2023 11:18:06 +0100
Subject: [PATCH] Revert "x86: acpi: workaround Windows not handling name
references in Package properly"
This reverts commit 44d975ef340e2f21f236f9520c53e1b30d2213a4.
As reported in the community forum [0] and reproduced locally this
breaks VirtIO network adapters in (at least) the German ISO of Windows
Server 2022. The fix itself was for
> Issue is not fatal but as result acpi-index/"PCI Label ID" property
> is either not shown in device details page or shows incorrect value.
so revert and tolerate that as a stop-gap, rather than have the
devices not working at all.
[0]: https://forum.proxmox.com/threads/92094/post-605684
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/i386/acpi-build.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 53f804ac16..9b1b9f0412 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -347,13 +347,9 @@ Aml *aml_pci_device_dsm(void)
{
Aml *params = aml_local(0);
Aml *pkg = aml_package(2);
- aml_append(pkg, aml_int(0));
- aml_append(pkg, aml_int(0));
+ aml_append(pkg, aml_name("BSEL"));
+ aml_append(pkg, aml_name("ASUN"));
aml_append(method, aml_store(pkg, params));
- aml_append(method,
- aml_store(aml_name("BSEL"), aml_index(params, aml_int(0))));
- aml_append(method,
- aml_store(aml_name("ASUN"), aml_index(params, aml_int(1))));
aml_append(method,
aml_return(aml_call5("PDSM", aml_arg(0), aml_arg(1),
aml_arg(2), aml_arg(3), params))

View File

@ -1,36 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Chenyi Qiang <chenyi.qiang@intel.com>
Date: Wed, 28 Dec 2022 17:03:12 +0800
Subject: [PATCH] virtio-mem: Fix the iterator variable in a vmem->rdl_list
loop
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
It should be the variable rdl2 to revert the already-notified listeners.
Fixes: 2044969f0b ("virtio-mem: Implement RamDiscardManager interface")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20221228090312.17276-1-chenyi.qiang@intel.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
(cherry-picked from commit 29f1b328e3b767cba2661920a8470738469b9e36)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/virtio/virtio-mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index e19ee817fe..56db586c89 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -341,7 +341,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
if (ret) {
/* Notify all already-notified listeners. */
QLIST_FOREACH(rdl2, &vmem->rdl_list, next) {
- MemoryRegionSection tmp = *rdl->section;
+ MemoryRegionSection tmp = *rdl2->section;
if (rdl2 == rdl) {
break;

View File

@ -0,0 +1,35 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 29 Apr 2024 15:41:11 +0200
Subject: [PATCH] block/copy-before-write: use uint64_t for timeout in
nanoseconds
rather than the uint32_t for which the maximum is slightly more than 4
seconds and larger values would overflow. The QAPI interface allows
specifying the number of seconds, so only values 0 to 4 are safe right
now, other values lead to a much lower timeout than a user expects.
The block_copy() call where this is used already takes a uint64_t for
the timeout, so no change required there.
Fixes: 6db7fd1ca9 ("block/copy-before-write: implement cbw-timeout option")
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
---
block/copy-before-write.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 8aba27a71d..026fa9840f 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -43,7 +43,7 @@ typedef struct BDRVCopyBeforeWriteState {
BlockCopyState *bcs;
BdrvChild *target;
OnCbwError on_cbw_error;
- uint32_t cbw_timeout_ns;
+ uint64_t cbw_timeout_ns;
/*
* @lock: protects access to @access_bitmap, @done_bitmap and

View File

@ -1,141 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Fri, 16 Dec 2022 11:35:52 +0800
Subject: [PATCH] vhost: fix vq dirty bitmap syncing when vIOMMU is enabled
When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
GPA. So we need to translate it to GPA before the syncing otherwise we
may hit the following crash since IOVA could be out of the scope of
the GPA log size. This could be noted when using virtio-IOMMU with
vhost using 1G memory.
Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
Cc: qemu-stable@nongnu.org
Tested-by: Lei Yang <leiyang@redhat.com>
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221216033552.77087-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry-picked from commit 345cc1cbcbce2bab00abc2b88338d7d89c702d6b)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/virtio/vhost.c | 84 ++++++++++++++++++++++++++++++++++++-----------
1 file changed, 64 insertions(+), 20 deletions(-)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 7fb008bc9e..fdcd1a8fdf 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -20,6 +20,7 @@
#include "qemu/range.h"
#include "qemu/error-report.h"
#include "qemu/memfd.h"
+#include "qemu/log.h"
#include "standard-headers/linux/vhost_types.h"
#include "hw/virtio/virtio-bus.h"
#include "hw/virtio/virtio-access.h"
@@ -106,6 +107,24 @@ static void vhost_dev_sync_region(struct vhost_dev *dev,
}
}
+static bool vhost_dev_has_iommu(struct vhost_dev *dev)
+{
+ VirtIODevice *vdev = dev->vdev;
+
+ /*
+ * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
+ * incremental memory mapping API via IOTLB API. For platform that
+ * does not have IOMMU, there's no need to enable this feature
+ * which may cause unnecessary IOTLB miss/update transactions.
+ */
+ if (vdev) {
+ return virtio_bus_device_iommu_enabled(vdev) &&
+ virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+ } else {
+ return false;
+ }
+}
+
static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
MemoryRegionSection *section,
hwaddr first,
@@ -137,8 +156,51 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
continue;
}
- vhost_dev_sync_region(dev, section, start_addr, end_addr, vq->used_phys,
- range_get_last(vq->used_phys, vq->used_size));
+ if (vhost_dev_has_iommu(dev)) {
+ IOMMUTLBEntry iotlb;
+ hwaddr used_phys = vq->used_phys, used_size = vq->used_size;
+ hwaddr phys, s, offset;
+
+ while (used_size) {
+ rcu_read_lock();
+ iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as,
+ used_phys,
+ true,
+ MEMTXATTRS_UNSPECIFIED);
+ rcu_read_unlock();
+
+ if (!iotlb.target_as) {
+ qemu_log_mask(LOG_GUEST_ERROR, "translation "
+ "failure for used_iova %"PRIx64"\n",
+ used_phys);
+ return -EINVAL;
+ }
+
+ offset = used_phys & iotlb.addr_mask;
+ phys = iotlb.translated_addr + offset;
+
+ /*
+ * Distance from start of used ring until last byte of
+ * IOMMU page.
+ */
+ s = iotlb.addr_mask - offset;
+ /*
+ * Size of used ring, or of the part of it until end
+ * of IOMMU page. To avoid zero result, do the adding
+ * outside of MIN().
+ */
+ s = MIN(s, used_size - 1) + 1;
+
+ vhost_dev_sync_region(dev, section, start_addr, end_addr, phys,
+ range_get_last(phys, s));
+ used_size -= s;
+ used_phys += s;
+ }
+ } else {
+ vhost_dev_sync_region(dev, section, start_addr,
+ end_addr, vq->used_phys,
+ range_get_last(vq->used_phys, vq->used_size));
+ }
}
return 0;
}
@@ -306,24 +368,6 @@ static inline void vhost_dev_log_resize(struct vhost_dev *dev, uint64_t size)
dev->log_size = size;
}
-static bool vhost_dev_has_iommu(struct vhost_dev *dev)
-{
- VirtIODevice *vdev = dev->vdev;
-
- /*
- * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
- * incremental memory mapping API via IOTLB API. For platform that
- * does not have IOMMU, there's no need to enable this feature
- * which may cause unnecessary IOTLB miss/update transactions.
- */
- if (vdev) {
- return virtio_bus_device_iommu_enabled(vdev) &&
- virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
- } else {
- return false;
- }
-}
-
static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr,
hwaddr *plen, bool is_write)
{

View File

@ -0,0 +1,98 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= <marcandre.lureau@redhat.com>
Date: Thu, 16 May 2024 12:40:22 +0400
Subject: [PATCH] virtio-gpu: fix v2 migration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit dfcf74fa ("virtio-gpu: fix scanout migration post-load") broke
forward/backward version migration. Versioning of nested VMSD structures
is not straightforward, as the wire format doesn't have nested
structures versions. Introduce x-scanout-vmstate-version and a field
test to save/load appropriately according to the machine version.
Fixes: dfcf74fa ("virtio-gpu: fix scanout migration post-load")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
hw/core/machine.c | 1 +
hw/display/virtio-gpu.c | 24 ++++++++++++++++--------
include/hw/virtio/virtio-gpu.h | 1 +
3 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 37ede0e7d4..d33a37a6f6 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -37,6 +37,7 @@ GlobalProperty hw_compat_8_2[] = {
{ "migration", "zero-page-detection", "legacy"},
{ TYPE_VIRTIO_IOMMU_PCI, "granule", "4k" },
{ TYPE_VIRTIO_IOMMU_PCI, "aw-bits", "64" },
+ { "virtio-gpu-device", "x-scanout-vmstate-version", "1" },
};
const size_t hw_compat_8_2_len = G_N_ELEMENTS(hw_compat_8_2);
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index ae831b6b3e..85323daf99 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1166,10 +1166,17 @@ static void virtio_gpu_cursor_bh(void *opaque)
virtio_gpu_handle_cursor(&g->parent_obj.parent_obj, g->cursor_vq);
}
+static bool scanout_vmstate_after_v2(void *opaque, int version)
+{
+ struct VirtIOGPUBase *base = container_of(opaque, VirtIOGPUBase, scanout);
+ struct VirtIOGPU *gpu = container_of(base, VirtIOGPU, parent_obj);
+
+ return gpu->scanout_vmstate_version >= 2;
+}
+
static const VMStateDescription vmstate_virtio_gpu_scanout = {
.name = "virtio-gpu-one-scanout",
- .version_id = 2,
- .minimum_version_id = 1,
+ .version_id = 1,
.fields = (const VMStateField[]) {
VMSTATE_UINT32(resource_id, struct virtio_gpu_scanout),
VMSTATE_UINT32(width, struct virtio_gpu_scanout),
@@ -1181,12 +1188,12 @@ static const VMStateDescription vmstate_virtio_gpu_scanout = {
VMSTATE_UINT32(cursor.hot_y, struct virtio_gpu_scanout),
VMSTATE_UINT32(cursor.pos.x, struct virtio_gpu_scanout),
VMSTATE_UINT32(cursor.pos.y, struct virtio_gpu_scanout),
- VMSTATE_UINT32_V(fb.format, struct virtio_gpu_scanout, 2),
- VMSTATE_UINT32_V(fb.bytes_pp, struct virtio_gpu_scanout, 2),
- VMSTATE_UINT32_V(fb.width, struct virtio_gpu_scanout, 2),
- VMSTATE_UINT32_V(fb.height, struct virtio_gpu_scanout, 2),
- VMSTATE_UINT32_V(fb.stride, struct virtio_gpu_scanout, 2),
- VMSTATE_UINT32_V(fb.offset, struct virtio_gpu_scanout, 2),
+ VMSTATE_UINT32_TEST(fb.format, struct virtio_gpu_scanout, scanout_vmstate_after_v2),
+ VMSTATE_UINT32_TEST(fb.bytes_pp, struct virtio_gpu_scanout, scanout_vmstate_after_v2),
+ VMSTATE_UINT32_TEST(fb.width, struct virtio_gpu_scanout, scanout_vmstate_after_v2),
+ VMSTATE_UINT32_TEST(fb.height, struct virtio_gpu_scanout, scanout_vmstate_after_v2),
+ VMSTATE_UINT32_TEST(fb.stride, struct virtio_gpu_scanout, scanout_vmstate_after_v2),
+ VMSTATE_UINT32_TEST(fb.offset, struct virtio_gpu_scanout, scanout_vmstate_after_v2),
VMSTATE_END_OF_LIST()
},
};
@@ -1659,6 +1666,7 @@ static Property virtio_gpu_properties[] = {
DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
+ DEFINE_PROP_UINT8("x-scanout-vmstate-version", VirtIOGPU, scanout_vmstate_version, 2),
DEFINE_PROP_END_OF_LIST(),
};
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index ed44cdad6b..842315d51d 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -177,6 +177,7 @@ typedef struct VGPUDMABuf {
struct VirtIOGPU {
VirtIOGPUBase parent_obj;
+ uint8_t scanout_vmstate_version;
uint64_t conf_max_hostmem;
VirtQueue *ctrl_vq;

View File

@ -0,0 +1,59 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Thu, 16 May 2024 10:46:34 +0200
Subject: [PATCH] hw/pflash: fix block write start
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Move the pflash_blk_write_start() call. We need the offset of the
first data write, not the offset for the setup (number-of-bytes)
write. Without this fix u-boot can do block writes to the first
flash block only.
While being at it drop a leftover FIXME.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2343
Fixes: fcc79f2e0955 ("hw/pflash: implement update buffer for block writes")
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(picked up from https://lists.nongnu.org/archive/html/qemu-stable/2024-05/msg00091.html)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/block/pflash_cfi01.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 1bda8424b9..c8f1cf5a87 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -518,10 +518,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
break;
case 0xe8: /* Write to buffer */
trace_pflash_write(pfl->name, "write to buffer");
- /* FIXME should save @offset, @width for case 1+ */
- qemu_log_mask(LOG_UNIMP,
- "%s: Write to buffer emulation is flawed\n",
- __func__);
pfl->status |= 0x80; /* Ready! */
break;
case 0xf0: /* Probe for AMD flash */
@@ -574,7 +570,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
}
pfl->counter = value;
pfl->wcycle++;
- pflash_blk_write_start(pfl, offset);
break;
case 0x60:
if (cmd == 0xd0) {
@@ -605,6 +600,9 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
switch (pfl->cmd) {
case 0xe8: /* Block write */
/* FIXME check @offset, @width */
+ if (pfl->blk_offset == -1 && pfl->counter) {
+ pflash_blk_write_start(pfl, offset);
+ }
if (!pfl->ro && (pfl->blk_offset != -1)) {
pflash_data_write(pfl, offset, value, width, be);
} else {

View File

@ -0,0 +1,51 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 9 May 2024 12:38:10 +0200
Subject: [PATCH] target/i386: fix operand size for DATA16 REX.W POPCNT
According to the manual, 32-bit vs 64-bit is governed by REX.W
and REX ignores the 0x66 prefix. This can be confirmed with this
program:
#include <stdio.h>
int main()
{
int x = 0x12340000;
int y;
asm("popcntl %1, %0" : "=r" (y) : "r" (x)); printf("%x\n", y);
asm("mov $-1, %0; .byte 0x66; popcntl %1, %0" : "+r" (y) : "r" (x)); printf("%x\n", y);
asm("mov $-1, %0; .byte 0x66; popcntq %q1, %q0" : "+r" (y) : "r" (x)); printf("%x\n", y);
}
which prints 5/ffff0000/5 on real hardware and 5/ffff0000/ffff0000
on QEMU.
Cc: qemu-stable@nongnu.org
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 41c685dc59bb611096f3bb6a663cfa82e4cba97b)
[FE: keep mo_64_32 helper which still has other users in 9.0.0]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/i386/tcg/translate.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 76a42c679c..b60f3bd642 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -6799,12 +6799,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
modrm = x86_ldub_code(env, s);
reg = ((modrm >> 3) & 7) | REX_R(s);
- if (s->prefix & PREFIX_DATA) {
- ot = MO_16;
- } else {
- ot = mo_64_32(dflag);
- }
-
+ ot = dflag;
gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
gen_extu(ot, s->T0);
tcg_gen_mov_tl(cpu_cc_src, s->T0);

View File

@ -0,0 +1,40 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 9 May 2024 15:55:47 +0200
Subject: [PATCH] target/i386: rdpkru/wrpkru are no-prefix instructions
Reject 0x66/0xf3/0xf2 in front of them.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 40a3ec7b5ffde500789d016660a171057d6b467c)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/i386/tcg/translate.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index b60f3bd642..3e949fe964 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -6083,7 +6083,8 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 1);
break;
case 0xee: /* rdpkru */
- if (prefixes & PREFIX_LOCK) {
+ if (s->prefix & (PREFIX_LOCK | PREFIX_DATA
+ | PREFIX_REPZ | PREFIX_REPNZ)) {
goto illegal_op;
}
tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
@@ -6091,7 +6092,8 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], s->tmp1_i64);
break;
case 0xef: /* wrpkru */
- if (prefixes & PREFIX_LOCK) {
+ if (s->prefix & (PREFIX_LOCK | PREFIX_DATA
+ | PREFIX_REPZ | PREFIX_REPNZ)) {
goto illegal_op;
}
tcg_gen_concat_tl_i64(s->tmp1_i64, cpu_regs[R_EAX],

View File

@ -0,0 +1,33 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Wed, 8 May 2024 11:10:54 +0200
Subject: [PATCH] target/i386: fix feature dependency for WAITPKG
The VMX feature bit depends on general availability of WAITPKG,
not the other way round.
Fixes: 33cc88261c3 ("target/i386: add support for VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE", 2023-08-28)
Cc: qemu-stable@nongnu.org
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit fe01af5d47d4cf7fdf90c54d43f784e5068c8d72)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/i386/cpu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 33760a2ee1..e693f8ca9a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1550,8 +1550,8 @@ static FeatureDep feature_dependencies[] = {
.to = { FEAT_SVM, ~0ull },
},
{
- .from = { FEAT_VMX_SECONDARY_CTLS, VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE },
- .to = { FEAT_7_0_ECX, CPUID_7_0_ECX_WAITPKG },
+ .from = { FEAT_7_0_ECX, CPUID_7_0_ECX_WAITPKG },
+ .to = { FEAT_VMX_SECONDARY_CTLS, VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE },
},
};

View File

@ -0,0 +1,87 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 16 May 2024 12:59:52 +0200
Subject: [PATCH] Revert "virtio-pci: fix use of a released vector"
This reverts commit 2ce6cff94df2650c460f809e5ad263f1d22507c0.
The fix causes some issues:
https://gitlab.com/qemu-project/qemu/-/issues/2321
https://gitlab.com/qemu-project/qemu/-/issues/2334
The CVE fixed by commit 2ce6cff94d ("virtio-pci: fix use of a released
vector") is CVE-2024-4693 [0] and allows a malicious guest that
controls the boot process in the guest to crash its QEMU process.
The issues sound worse than the CVE, so revert until there is a proper
fix.
[0]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-4693
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/virtio/virtio-pci.c | 37 ++-----------------------------------
1 file changed, 2 insertions(+), 35 deletions(-)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index cb159fd078..cb6940fc0e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1424,38 +1424,6 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
return offset;
}
-static void virtio_pci_set_vector(VirtIODevice *vdev,
- VirtIOPCIProxy *proxy,
- int queue_no, uint16_t old_vector,
- uint16_t new_vector)
-{
- bool kvm_irqfd = (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
- msix_enabled(&proxy->pci_dev) && kvm_msi_via_irqfd_enabled();
-
- if (new_vector == old_vector) {
- return;
- }
-
- /*
- * If the device uses irqfd and the vector changes after DRIVER_OK is
- * set, we need to release the old vector and set up the new one.
- * Otherwise just need to set the new vector on the device.
- */
- if (kvm_irqfd && old_vector != VIRTIO_NO_VECTOR) {
- kvm_virtio_pci_vector_release_one(proxy, queue_no);
- }
- /* Set the new vector on the device. */
- if (queue_no == VIRTIO_CONFIG_IRQ_IDX) {
- vdev->config_vector = new_vector;
- } else {
- virtio_queue_set_vector(vdev, queue_no, new_vector);
- }
- /* If the new vector changed need to set it up. */
- if (kvm_irqfd && new_vector != VIRTIO_NO_VECTOR) {
- kvm_virtio_pci_vector_use_one(proxy, queue_no);
- }
-}
-
int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
uint8_t bar, uint64_t offset, uint64_t length,
uint8_t id)
@@ -1602,8 +1570,7 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
} else {
val = VIRTIO_NO_VECTOR;
}
- virtio_pci_set_vector(vdev, proxy, VIRTIO_CONFIG_IRQ_IDX,
- vdev->config_vector, val);
+ vdev->config_vector = val;
break;
case VIRTIO_PCI_COMMON_STATUS:
if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
@@ -1643,7 +1610,7 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
} else {
val = VIRTIO_NO_VECTOR;
}
- virtio_pci_set_vector(vdev, proxy, vdev->queue_sel, vector, val);
+ virtio_queue_set_vector(vdev, vdev->queue_sel, val);
break;
case VIRTIO_PCI_COMMON_Q_ENABLE:
if (val == 1) {

View File

@ -0,0 +1,57 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 16 May 2024 15:21:07 +0200
Subject: [PATCH] hw/core/machine: move compatibility flags for VirtIO-net USO
to machine 8.1
Migration from an 8.2 or 9.0 binary to an 8.1 binary with machine
version 8.1 can fail with:
> kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 0x10179bfffe7
> kvm: Failed to load virtio-net:virtio
> kvm: error while loading state for instance 0x0 of device '0000:00:12.0/virtio-net'
> kvm: load of migration failed: Operation not permitted
The series
53da8b5a99 virtio-net: Add support for USO features
9da1684954 virtio-net: Add USO flags to vhost support.
f03e0cf63b tap: Add check for USO features
2ab0ec3121 tap: Add USO support to tap device.
only landed in QEMU 8.2, so the compatibility flags should be part of
machine version 8.1.
Moving the flags unfortunately breaks forward migration with machine
version 8.1 from a binary without this patch to a binary with this
patch when the feature is enabled by the guest.
Fixes: 53da8b5a99 ("virtio-net: Add support for USO features")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/core/machine.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index d33a37a6f6..4273de16a0 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -46,15 +46,15 @@ GlobalProperty hw_compat_8_1[] = {
{ "ramfb", "x-migrate", "off" },
{ "vfio-pci-nohotplug", "x-ramfb-migrate", "off" },
{ "igb", "x-pcie-flr-init", "off" },
+ { TYPE_VIRTIO_NET, "host_uso", "off"},
+ { TYPE_VIRTIO_NET, "guest_uso4", "off"},
+ { TYPE_VIRTIO_NET, "guest_uso6", "off"},
};
const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
GlobalProperty hw_compat_8_0[] = {
{ "migration", "multifd-flush-after-each-section", "on"},
{ TYPE_PCI_DEVICE, "x-pcie-ari-nextfn-1", "on" },
- { TYPE_VIRTIO_NET, "host_uso", "off"},
- { TYPE_VIRTIO_NET, "guest_uso4", "off"},
- { TYPE_VIRTIO_NET, "guest_uso6", "off"},
};
const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);

1274
debian/patches/pve-qemu-9.0-vitastor.patch vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -14,10 +14,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index b9647c5ffc..9a16d86344 100644
index 35684f7e21..43bc0bd520 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -552,7 +552,7 @@ static QemuOptsList raw_runtime_opts = {
@@ -563,7 +563,7 @@ static QemuOptsList raw_runtime_opts = {
{
.name = "locking",
.type = QEMU_OPT_STRING,
@ -26,7 +26,7 @@ index b9647c5ffc..9a16d86344 100644
},
{
.name = "pr-manager",
@@ -652,7 +652,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
@@ -663,7 +663,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
s->use_lock = false;
break;
case ON_OFF_AUTO_AUTO:

View File

@ -9,10 +9,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/net/net.h b/include/net/net.h
index dc20b31e9f..5ae04a8693 100644
index b1f9b35fcc..096c0d52e4 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -236,8 +236,8 @@ void netdev_add(QemuOpts *opts, Error **errp);
@@ -317,8 +317,8 @@ void netdev_add(QemuOpts *opts, Error **errp);
int net_hub_id_for_client(NetClientState *nc, int *id);
NetClientState *net_hub_port_find(int hub_id);

View File

@ -10,10 +10,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d4bc19577a..be7da64f38 100644
index 6b05738079..d82869900a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2174,9 +2174,9 @@ uint64_t cpu_get_tsc(CPUX86State *env);
@@ -2291,9 +2291,9 @@ uint64_t cpu_get_tsc(CPUX86State *env);
#define CPU_RESOLVING_TYPE TYPE_X86_CPU
#ifdef TARGET_X86_64

View File

@ -9,10 +9,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/ui/spice-core.c b/ui/spice-core.c
index c3ac20ad43..37774f1c0a 100644
index 15be640286..ea20e6153c 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -689,32 +689,35 @@ static void qemu_spice_init(void)
@@ -690,32 +690,35 @@ static void qemu_spice_init(void)
if (tls_port) {
x509_dir = qemu_opt_get(opts, "x509-dir");

View File

@ -9,10 +9,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/block/gluster.c b/block/gluster.c
index 7c90f7ba4b..2e03102f00 100644
index cc74af06dc..3ba9bbfa5e 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -42,7 +42,7 @@
@@ -43,7 +43,7 @@
#define GLUSTER_DEBUG_DEFAULT 4
#define GLUSTER_DEBUG_MAX 9
#define GLUSTER_OPT_LOGFILE "logfile"
@ -21,15 +21,15 @@ index 7c90f7ba4b..2e03102f00 100644
/*
* Several versions of GlusterFS (3.12? -> 6.0.1) fail when the transfer size
* is greater or equal to 1024 MiB, so we are limiting the transfer size to 512
@@ -424,6 +424,7 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
@@ -425,6 +425,7 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
int old_errno;
SocketAddressList *server;
unsigned long long port;
uint64_t port;
+ const char *logfile;
glfs = glfs_find_preopened(gconf->volume);
if (glfs) {
@@ -466,9 +467,15 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
@@ -467,9 +468,15 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
}
}

View File

@ -18,10 +18,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+)
diff --git a/block/rbd.c b/block/rbd.c
index f826410f40..64a8d7d48b 100644
index 84bb2fa5d7..63f60d41be 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -820,6 +820,8 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
@@ -963,6 +963,8 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
rados_conf_set(*cluster, "rbd_cache", "false");
}

View File

@ -16,10 +16,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/block/gluster.c b/block/gluster.c
index 2e03102f00..7886c5fe8c 100644
index 3ba9bbfa5e..34936eb855 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -57,6 +57,7 @@ typedef struct GlusterAIOCB {
@@ -58,6 +58,7 @@ typedef struct GlusterAIOCB {
int ret;
Coroutine *coroutine;
AioContext *aio_context;
@ -27,7 +27,7 @@ index 2e03102f00..7886c5fe8c 100644
} GlusterAIOCB;
typedef struct BDRVGlusterState {
@@ -752,8 +753,10 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret,
@@ -753,8 +754,10 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret,
acb->ret = 0; /* Success */
} else if (ret < 0) {
acb->ret = -errno; /* Read/Write failed */
@ -39,7 +39,7 @@ index 2e03102f00..7886c5fe8c 100644
}
aio_co_schedule(acb->aio_context, acb->coroutine);
@@ -1022,6 +1025,7 @@ static coroutine_fn int qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
@@ -1023,6 +1026,7 @@ static coroutine_fn int qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);

View File

@ -1,98 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:37 +0200
Subject: [PATCH] PVE: [Up] qmp: add get_link_status
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: add get_link_status to command name exceptions]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
net/net.c | 27 +++++++++++++++++++++++++++
qapi/net.json | 15 +++++++++++++++
qapi/pragma.json | 2 ++
3 files changed, 44 insertions(+)
diff --git a/net/net.c b/net/net.c
index 840ad9dca5..28e97c5d85 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1372,6 +1372,33 @@ void hmp_info_network(Monitor *mon, const QDict *qdict)
}
}
+int64_t qmp_get_link_status(const char *name, Error **errp)
+{
+ NetClientState *ncs[MAX_QUEUE_NUM];
+ NetClientState *nc;
+ int queues;
+ bool ret;
+
+ queues = qemu_find_net_clients_except(name, ncs,
+ NET_CLIENT_DRIVER__MAX,
+ MAX_QUEUE_NUM);
+
+ if (queues == 0) {
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ "Device '%s' not found", name);
+ return (int64_t) -1;
+ }
+
+ nc = ncs[0];
+ ret = ncs[0]->link_down;
+
+ if (nc->peer->info->type == NET_CLIENT_DRIVER_NIC) {
+ ret = ncs[0]->peer->link_down;
+ }
+
+ return (int64_t) ret ? 0 : 1;
+}
+
void colo_notify_filters_event(int event, Error **errp)
{
NetClientState *nc;
diff --git a/qapi/net.json b/qapi/net.json
index 522ac582ed..327d7c5a37 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -36,6 +36,21 @@
##
{ 'command': 'set_link', 'data': {'name': 'str', 'up': 'bool'} }
+##
+# @get_link_status:
+#
+# Get the current link state of the nics or nic.
+#
+# @name: name of the nic you get the state of
+#
+# Return: If link is up 1
+# If link is down 0
+# If an error occure an empty string.
+#
+# Notes: this is an Proxmox VE extension and not offical part of Qemu.
+##
+{ 'command': 'get_link_status', 'data': {'name': 'str'} , 'returns': 'int' }
+
##
# @netdev_add:
#
diff --git a/qapi/pragma.json b/qapi/pragma.json
index 7f810b0e97..29233db825 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -15,6 +15,7 @@
'device_add',
'device_del',
'expire_password',
+ 'get_link_status',
'migrate_cancel',
'netdev_add',
'netdev_del',
@@ -26,6 +27,7 @@
'system_wakeup' ],
# Commands allowed to return a non-dictionary
'command-returns-exceptions': [
+ 'get_link_status',
'human-monitor-command',
'qom-get',
'query-tpm-models',

View File

@ -9,10 +9,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/qemu-img.c b/qemu-img.c
index a9b3a8103c..0bc9f1af59 100644
index 7668f86769..2575e97b43 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3013,7 +3013,8 @@ static int img_info(int argc, char **argv)
@@ -3075,7 +3075,8 @@ static int img_info(int argc, char **argv)
list = collect_image_info_list(image_opts, filename, fmt, chain,
force_share);
if (!list) {

View File

@ -38,10 +38,10 @@ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2 files changed, 133 insertions(+), 73 deletions(-)
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 1b1dab5b17..d1616c045a 100644
index c9dd70a892..048788b23d 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -58,9 +58,9 @@ SRST
@@ -60,9 +60,9 @@ SRST
ERST
DEF("dd", img_dd,
@ -54,10 +54,10 @@ index 1b1dab5b17..d1616c045a 100644
DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index 0bc9f1af59..221b9d6a16 100644
index 2575e97b43..8ec68b346f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4829,10 +4829,12 @@ static int img_bitmap(int argc, char **argv)
@@ -4993,10 +4993,12 @@ static int img_bitmap(int argc, char **argv)
#define C_IF 04
#define C_OF 010
#define C_SKIP 020
@ -70,7 +70,7 @@ index 0bc9f1af59..221b9d6a16 100644
};
struct DdIo {
@@ -4908,6 +4910,19 @@ static int img_dd_skip(const char *arg,
@@ -5072,6 +5074,19 @@ static int img_dd_skip(const char *arg,
return 0;
}
@ -90,7 +90,7 @@ index 0bc9f1af59..221b9d6a16 100644
static int img_dd(int argc, char **argv)
{
int ret = 0;
@@ -4948,6 +4963,7 @@ static int img_dd(int argc, char **argv)
@@ -5112,6 +5127,7 @@ static int img_dd(int argc, char **argv)
{ "if", img_dd_if, C_IF },
{ "of", img_dd_of, C_OF },
{ "skip", img_dd_skip, C_SKIP },
@ -98,7 +98,7 @@ index 0bc9f1af59..221b9d6a16 100644
{ NULL, NULL, 0 }
};
const struct option long_options[] = {
@@ -5023,91 +5039,112 @@ static int img_dd(int argc, char **argv)
@@ -5187,91 +5203,112 @@ static int img_dd(int argc, char **argv)
arg = NULL;
}
@ -275,7 +275,7 @@ index 0bc9f1af59..221b9d6a16 100644
}
if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
@@ -5124,20 +5161,43 @@ static int img_dd(int argc, char **argv)
@@ -5288,20 +5325,43 @@ static int img_dd(int argc, char **argv)
in.buf = g_new(uint8_t, in.bsz);
for (out_pos = 0; in_pos < size; ) {

View File

@ -16,10 +16,10 @@ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/qemu-img.c b/qemu-img.c
index 221b9d6a16..c1306385a8 100644
index 8ec68b346f..b98184bba1 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4830,11 +4830,13 @@ static int img_bitmap(int argc, char **argv)
@@ -4994,11 +4994,13 @@ static int img_bitmap(int argc, char **argv)
#define C_OF 010
#define C_SKIP 020
#define C_OSIZE 040
@ -33,7 +33,7 @@ index 221b9d6a16..c1306385a8 100644
};
struct DdIo {
@@ -4923,6 +4925,19 @@ static int img_dd_osize(const char *arg,
@@ -5087,6 +5089,19 @@ static int img_dd_osize(const char *arg,
return 0;
}
@ -53,7 +53,7 @@ index 221b9d6a16..c1306385a8 100644
static int img_dd(int argc, char **argv)
{
int ret = 0;
@@ -4937,12 +4952,14 @@ static int img_dd(int argc, char **argv)
@@ -5101,12 +5116,14 @@ static int img_dd(int argc, char **argv)
int c, i;
const char *out_fmt = "raw";
const char *fmt = NULL;
@ -69,7 +69,7 @@ index 221b9d6a16..c1306385a8 100644
};
struct DdIo in = {
.bsz = 512, /* Block size is by default 512 bytes */
@@ -4964,6 +4981,7 @@ static int img_dd(int argc, char **argv)
@@ -5128,6 +5145,7 @@ static int img_dd(int argc, char **argv)
{ "of", img_dd_of, C_OF },
{ "skip", img_dd_skip, C_SKIP },
{ "osize", img_dd_osize, C_OSIZE },
@ -77,7 +77,7 @@ index 221b9d6a16..c1306385a8 100644
{ NULL, NULL, 0 }
};
const struct option long_options[] = {
@@ -5160,9 +5178,10 @@ static int img_dd(int argc, char **argv)
@@ -5324,9 +5342,10 @@ static int img_dd(int argc, char **argv)
in.buf = g_new(uint8_t, in.bsz);
@ -90,7 +90,7 @@ index 221b9d6a16..c1306385a8 100644
if (blk1) {
in_ret = blk_pread(blk1, in_pos, bytes, in.buf, 0);
if (in_ret == 0) {
@@ -5171,6 +5190,9 @@ static int img_dd(int argc, char **argv)
@@ -5335,6 +5354,9 @@ static int img_dd(int argc, char **argv)
} else {
in_ret = read(STDIN_FILENO, in.buf, bytes);
if (in_ret == 0) {

View File

@ -13,10 +13,10 @@ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
3 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 15aeddc6d8..5e713e231d 100644
index 3653adb963..d83e8fb3c0 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -208,6 +208,10 @@ Parameters to convert subcommand:
@@ -212,6 +212,10 @@ Parameters to convert subcommand:
Parameters to dd subcommand:
@ -27,7 +27,7 @@ index 15aeddc6d8..5e713e231d 100644
.. program:: qemu-img-dd
.. option:: bs=BLOCK_SIZE
@@ -488,7 +492,7 @@ Command description:
@@ -492,7 +496,7 @@ Command description:
it doesn't need to be specified separately in this case.
@ -36,7 +36,7 @@ index 15aeddc6d8..5e713e231d 100644
dd copies from *INPUT* file to *OUTPUT* file converting it from
*FMT* format to *OUTPUT_FMT* format.
@@ -499,6 +503,11 @@ Command description:
@@ -503,6 +507,11 @@ Command description:
The size syntax is similar to :manpage:`dd(1)`'s size syntax.
@ -49,10 +49,10 @@ index 15aeddc6d8..5e713e231d 100644
Give information about the disk image *FILENAME*. Use it in
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index d1616c045a..b5b0bb4467 100644
index 048788b23d..0b29a67a06 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -58,9 +58,9 @@ SRST
@@ -60,9 +60,9 @@ SRST
ERST
DEF("dd", img_dd,
@ -65,10 +65,10 @@ index d1616c045a..b5b0bb4467 100644
DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index c1306385a8..59c403373b 100644
index b98184bba1..6fc8384f64 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4954,7 +4954,7 @@ static int img_dd(int argc, char **argv)
@@ -5118,7 +5118,7 @@ static int img_dd(int argc, char **argv)
const char *fmt = NULL;
int64_t size = 0, readsize = 0;
int64_t out_pos, in_pos;
@ -77,7 +77,7 @@ index c1306385a8..59c403373b 100644
struct DdInfo dd = {
.flags = 0,
.count = 0,
@@ -4992,7 +4992,7 @@ static int img_dd(int argc, char **argv)
@@ -5156,7 +5156,7 @@ static int img_dd(int argc, char **argv)
{ 0, 0, 0, 0 }
};
@ -86,7 +86,7 @@ index c1306385a8..59c403373b 100644
if (c == EOF) {
break;
}
@@ -5012,6 +5012,9 @@ static int img_dd(int argc, char **argv)
@@ -5176,6 +5176,9 @@ static int img_dd(int argc, char **argv)
case 'h':
help();
break;
@ -96,7 +96,7 @@ index c1306385a8..59c403373b 100644
case 'U':
force_share = true;
break;
@@ -5142,13 +5145,15 @@ static int img_dd(int argc, char **argv)
@@ -5306,13 +5309,15 @@ static int img_dd(int argc, char **argv)
size - in.bsz * in.offset, &error_abort);
}

View File

@ -1,9 +1,9 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Mon, 7 Feb 2022 14:21:01 +0100
Subject: [PATCH] qemu-img: dd: add -l option for loading a snapshot
Subject: [PATCH] qemu-img dd: add -l option for loading a snapshot
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
docs/tools/qemu-img.rst | 6 +++---
@ -12,10 +12,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
3 files changed, 36 insertions(+), 7 deletions(-)
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 5e713e231d..9390d5e5cf 100644
index d83e8fb3c0..61c6b21859 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -492,10 +492,10 @@ Command description:
@@ -496,10 +496,10 @@ Command description:
it doesn't need to be specified separately in this case.
@ -30,10 +30,10 @@ index 5e713e231d..9390d5e5cf 100644
The data is by default read and written using blocks of 512 bytes but can be
modified by specifying *BLOCK_SIZE*. If count=\ *BLOCKS* is specified
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index b5b0bb4467..36f97e1f19 100644
index 0b29a67a06..758f397232 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -58,9 +58,9 @@ SRST
@@ -60,9 +60,9 @@ SRST
ERST
DEF("dd", img_dd,
@ -46,10 +46,10 @@ index b5b0bb4467..36f97e1f19 100644
DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index 59c403373b..065a54cc42 100644
index 6fc8384f64..a6c88e0860 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4946,6 +4946,7 @@ static int img_dd(int argc, char **argv)
@@ -5110,6 +5110,7 @@ static int img_dd(int argc, char **argv)
BlockDriver *drv = NULL, *proto_drv = NULL;
BlockBackend *blk1 = NULL, *blk2 = NULL;
QemuOpts *opts = NULL;
@ -57,7 +57,7 @@ index 59c403373b..065a54cc42 100644
QemuOptsList *create_opts = NULL;
Error *local_err = NULL;
bool image_opts = false;
@@ -4955,6 +4956,7 @@ static int img_dd(int argc, char **argv)
@@ -5119,6 +5120,7 @@ static int img_dd(int argc, char **argv)
int64_t size = 0, readsize = 0;
int64_t out_pos, in_pos;
bool force_share = false, skip_create = false;
@ -65,7 +65,7 @@ index 59c403373b..065a54cc42 100644
struct DdInfo dd = {
.flags = 0,
.count = 0,
@@ -4992,7 +4994,7 @@ static int img_dd(int argc, char **argv)
@@ -5156,7 +5158,7 @@ static int img_dd(int argc, char **argv)
{ 0, 0, 0, 0 }
};
@ -74,7 +74,7 @@ index 59c403373b..065a54cc42 100644
if (c == EOF) {
break;
}
@@ -5015,6 +5017,19 @@ static int img_dd(int argc, char **argv)
@@ -5179,6 +5181,19 @@ static int img_dd(int argc, char **argv)
case 'n':
skip_create = true;
break;
@ -94,7 +94,7 @@ index 59c403373b..065a54cc42 100644
case 'U':
force_share = true;
break;
@@ -5074,11 +5089,24 @@ static int img_dd(int argc, char **argv)
@@ -5238,11 +5253,24 @@ static int img_dd(int argc, char **argv)
if (dd.flags & C_IF) {
blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
force_share);
@ -120,7 +120,7 @@ index 59c403373b..065a54cc42 100644
}
if (dd.flags & C_OSIZE) {
@@ -5233,6 +5261,7 @@ static int img_dd(int argc, char **argv)
@@ -5397,6 +5425,7 @@ static int img_dd(int argc, char **argv)
out:
g_free(arg);
qemu_opts_del(opts);

View File

@ -7,20 +7,62 @@ Actually provide memory information via the query-balloon
command.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: add BalloonInfo to member name exceptions list]
[FE: add BalloonInfo to member name exceptions list
rebase for 8.0 - moved to hw/core/machine-hmp-cmds.c]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/core/machine-hmp-cmds.c | 30 +++++++++++++++++++++++++++++-
hw/virtio/virtio-balloon.c | 33 +++++++++++++++++++++++++++++++--
monitor/hmp-cmds.c | 30 +++++++++++++++++++++++++++++-
qapi/machine.json | 22 +++++++++++++++++++++-
qapi/pragma.json | 1 +
4 files changed, 82 insertions(+), 4 deletions(-)
diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index a6ff6a4875..e7f74d1c63 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -175,7 +175,35 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict)
return;
}
- monitor_printf(mon, "balloon: actual=%" PRId64 "\n", info->actual >> 20);
+ monitor_printf(mon, "balloon: actual=%" PRId64, info->actual >> 20);
+ monitor_printf(mon, " max_mem=%" PRId64, info->max_mem >> 20);
+ if (info->has_total_mem) {
+ monitor_printf(mon, " total_mem=%" PRId64, info->total_mem >> 20);
+ }
+ if (info->has_free_mem) {
+ monitor_printf(mon, " free_mem=%" PRId64, info->free_mem >> 20);
+ }
+
+ if (info->has_mem_swapped_in) {
+ monitor_printf(mon, " mem_swapped_in=%" PRId64, info->mem_swapped_in);
+ }
+ if (info->has_mem_swapped_out) {
+ monitor_printf(mon, " mem_swapped_out=%" PRId64, info->mem_swapped_out);
+ }
+ if (info->has_major_page_faults) {
+ monitor_printf(mon, " major_page_faults=%" PRId64,
+ info->major_page_faults);
+ }
+ if (info->has_minor_page_faults) {
+ monitor_printf(mon, " minor_page_faults=%" PRId64,
+ info->minor_page_faults);
+ }
+ if (info->has_last_update) {
+ monitor_printf(mon, " last_update=%" PRId64,
+ info->last_update);
+ }
+
+ monitor_printf(mon, "\n");
qapi_free_BalloonInfo(info);
}
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 73ac5eb675..bbfe7eca62 100644
index 609e39a821..8cb6dfcac3 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -806,8 +806,37 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
@@ -781,8 +781,37 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
static void virtio_balloon_stat(void *opaque, BalloonInfo *info)
{
VirtIOBalloon *dev = opaque;
@ -60,54 +102,13 @@ index 73ac5eb675..bbfe7eca62 100644
}
static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 01b789a79e..480b798963 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -696,7 +696,35 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict)
return;
}
- monitor_printf(mon, "balloon: actual=%" PRId64 "\n", info->actual >> 20);
+ monitor_printf(mon, "balloon: actual=%" PRId64, info->actual >> 20);
+ monitor_printf(mon, " max_mem=%" PRId64, info->max_mem >> 20);
+ if (info->has_total_mem) {
+ monitor_printf(mon, " total_mem=%" PRId64, info->total_mem >> 20);
+ }
+ if (info->has_free_mem) {
+ monitor_printf(mon, " free_mem=%" PRId64, info->free_mem >> 20);
+ }
+
+ if (info->has_mem_swapped_in) {
+ monitor_printf(mon, " mem_swapped_in=%" PRId64, info->mem_swapped_in);
+ }
+ if (info->has_mem_swapped_out) {
+ monitor_printf(mon, " mem_swapped_out=%" PRId64, info->mem_swapped_out);
+ }
+ if (info->has_major_page_faults) {
+ monitor_printf(mon, " major_page_faults=%" PRId64,
+ info->major_page_faults);
+ }
+ if (info->has_minor_page_faults) {
+ monitor_printf(mon, " minor_page_faults=%" PRId64,
+ info->minor_page_faults);
+ }
+ if (info->has_last_update) {
+ monitor_printf(mon, " last_update=%" PRId64,
+ info->last_update);
+ }
+
+ monitor_printf(mon, "\n");
qapi_free_BalloonInfo(info);
}
diff --git a/qapi/machine.json b/qapi/machine.json
index b9228a5e46..10e77a9af3 100644
index e8b60641f2..2054cdc70d 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1054,9 +1054,29 @@
# @actual: the logical size of the VM in bytes
# Formula used: logical_vm_size = vm_ram_size - balloon_size
@@ -1079,9 +1079,29 @@
# @actual: the logical size of the VM in bytes Formula used:
# logical_vm_size = vm_ram_size - balloon_size
#
+# @last_update: time when stats got updated from guest
+#
@ -137,10 +138,10 @@ index b9228a5e46..10e77a9af3 100644
##
# @query-balloon:
diff --git a/qapi/pragma.json b/qapi/pragma.json
index 29233db825..f2097b9020 100644
index 59fbe74b8c..be8fa304c5 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -37,6 +37,7 @@
@@ -90,6 +90,7 @@
'member-name-exceptions': [ # visible in:
'ACPISlotType', # query-acpi-ospm-status
'AcpiTableOptions', # -acpitable

View File

@ -13,13 +13,13 @@ Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 4f4ab30f8c..76fff60a6b 100644
index 4b72009cd3..314351cdff 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -99,6 +99,12 @@ MachineInfoList *qmp_query_machines(Error **errp)
info->hotpluggable_cpus = mc->has_hotpluggable_cpus;
@@ -90,6 +90,12 @@ MachineInfoList *qmp_query_machines(Error **errp)
info->numa_mem_supported = mc->numa_mem_supported;
info->deprecated = !!mc->deprecation_reason;
info->acpi = !!object_class_property_find(OBJECT_CLASS(mc), "acpi");
+
+ if (strcmp(mc->name, MACHINE_GET_CLASS(current_machine)->name) == 0) {
+ info->has_is_current = true;
@ -28,21 +28,21 @@ index 4f4ab30f8c..76fff60a6b 100644
+
if (mc->default_cpu_type) {
info->default_cpu_type = g_strdup(mc->default_cpu_type);
info->has_default_cpu_type = true;
}
diff --git a/qapi/machine.json b/qapi/machine.json
index 10e77a9af3..9156103c8f 100644
index 2054cdc70d..a024d5b05d 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -138,6 +138,8 @@
@@ -146,6 +146,8 @@
#
# @is-default: whether the machine is default
#
+# @is-current: whether this machine is currently used
+#
# @cpu-max: maximum number of CPUs supported by the machine type
# (since 1.5)
# (since 1.5)
#
@@ -159,7 +161,7 @@
@@ -170,7 +172,7 @@
##
{ 'struct': 'MachineInfo',
'data': { 'name': 'str', '*alias': 'str',
@ -50,4 +50,4 @@ index 10e77a9af3..9156103c8f 100644
+ '*is-default': 'bool', '*is-current': 'bool', 'cpu-max': 'int',
'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool',
'deprecated': 'bool', '*default-cpu-type': 'str',
'*default-ram-id': 'str' } }
'*default-ram-id': 'str', 'acpi': 'bool' } }

View File

@ -6,16 +6,18 @@ Subject: [PATCH] PVE: qapi: modify spice query
Provide the last ticket in the SpiceInfo struct optionally.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to QAPI change]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
qapi/ui.json | 3 +++
ui/spice-core.c | 5 +++++
2 files changed, 8 insertions(+)
ui/spice-core.c | 4 ++++
2 files changed, 7 insertions(+)
diff --git a/qapi/ui.json b/qapi/ui.json
index 0abba3e930..bf8f441227 100644
index f610bce118..6ea26a9acb 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -310,11 +310,14 @@
@@ -314,11 +314,14 @@
#
# @channels: a list of @SpiceChannel for each active spice channel
#
@ -31,15 +33,14 @@ index 0abba3e930..bf8f441227 100644
'if': 'CONFIG_SPICE' }
diff --git a/ui/spice-core.c b/ui/spice-core.c
index 37774f1c0a..367f77f2b4 100644
index ea20e6153c..55a15fba8b 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -534,6 +534,11 @@ static SpiceInfo *qmp_query_spice_real(Error **errp)
@@ -548,6 +548,10 @@ static SpiceInfo *qmp_query_spice_real(Error **errp)
micro = SPICE_SERVER_VERSION & 0xff;
info->compiled_version = g_strdup_printf("%d.%d.%d", major, minor, micro);
+ if (auth_passwd) {
+ info->has_ticket = true;
+ info->ticket = g_strdup(auth_passwd);
+ }
+

View File

@ -14,20 +14,21 @@ Additionally, allows tracking the current position from the outside
(intended to be used for progress tracking).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
migration/channel-savevm-async.c | 182 +++++++++++++++++++++++++++++++
migration/channel-savevm-async.c | 184 +++++++++++++++++++++++++++++++
migration/channel-savevm-async.h | 51 +++++++++
migration/meson.build | 1 +
3 files changed, 234 insertions(+)
3 files changed, 236 insertions(+)
create mode 100644 migration/channel-savevm-async.c
create mode 100644 migration/channel-savevm-async.h
diff --git a/migration/channel-savevm-async.c b/migration/channel-savevm-async.c
new file mode 100644
index 0000000000..06d5484778
index 0000000000..081a192f49
--- /dev/null
+++ b/migration/channel-savevm-async.c
@@ -0,0 +1,182 @@
@@ -0,0 +1,184 @@
+/*
+ * QIO Channel implementation to be used by savevm-async QMP calls
+ */
@ -71,6 +72,7 @@ index 0000000000..06d5484778
+ size_t niov,
+ int **fds,
+ size_t *nfds,
+ int flags,
+ Error **errp)
+{
+ QIOChannelSavevmAsync *saioc = QIO_CHANNEL_SAVEVM_ASYNC(ioc);
@ -173,8 +175,9 @@ index 0000000000..06d5484778
+
+static void
+qio_channel_savevm_async_set_aio_fd_handler(QIOChannel *ioc,
+ AioContext *ctx,
+ AioContext *read_ctx,
+ IOHandler *io_read,
+ AioContext *write_ctx,
+ IOHandler *io_write,
+ void *opaque)
+{
@ -268,14 +271,14 @@ index 0000000000..17ae2cb261
+
+#endif /* QIO_CHANNEL_SAVEVM_ASYNC_H */
diff --git a/migration/meson.build b/migration/meson.build
index 690487cf1a..8cac83c06c 100644
index 1eeb915ff6..95d1cf2250 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -13,6 +13,7 @@ softmmu_ss.add(files(
@@ -13,6 +13,7 @@ system_ss.add(files(
'block-dirty-bitmap.c',
'channel.c',
'channel-block.c',
+ 'channel-savevm-async.c',
'colo-failover.c',
'colo.c',
'dirtyrate.c',
'exec.c',
'fd.c',

View File

@ -21,31 +21,34 @@ still opened by QEMU.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[improve aborting]
[SR: improve aborting
register yank before migration_incoming_state_destroy]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[FE: further improve aborting
adapt to removal of QEMUFileOps]
adapt to removal of QEMUFileOps
improve condition for entering final stage
adapt to QAPI and other changes for 8.2]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hmp-commands-info.hx | 13 +
hmp-commands.hx | 33 +++
hmp-commands.hx | 17 ++
include/migration/snapshot.h | 2 +
include/monitor/hmp.h | 5 +
include/monitor/hmp.h | 3 +
migration/meson.build | 1 +
migration/savevm-async.c | 531 +++++++++++++++++++++++++++++++++++
monitor/hmp-cmds.c | 57 ++++
monitor/hmp-cmds.c | 38 +++
qapi/migration.json | 34 +++
qapi/misc.json | 32 +++
qapi/misc.json | 18 ++
qemu-options.hx | 12 +
softmmu/vl.c | 10 +
11 files changed, 730 insertions(+)
system/vl.c | 10 +
11 files changed, 679 insertions(+)
create mode 100644 migration/savevm-async.c
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 754b1e8408..489c524e9e 100644
index ad1b1306e3..d5ab880492 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -540,6 +540,19 @@ SRST
@@ -525,6 +525,19 @@ SRST
Show current migration parameters.
ERST
@ -66,11 +69,11 @@ index 754b1e8408..489c524e9e 100644
.name = "balloon",
.args_type = "",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 673e39a697..039be0033d 100644
index 2e2a3bcf98..7506de251c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1815,3 +1815,36 @@ SRST
Dump the FDT in dtb format to *filename*.
@@ -1862,3 +1862,20 @@ SRST
List event channels in the guest
ERST
#endif
+
@ -83,22 +86,6 @@ index 673e39a697..039be0033d 100644
+ },
+
+ {
+ .name = "snapshot-drive",
+ .args_type = "device:s,name:s",
+ .params = "device name",
+ .help = "Create internal snapshot.",
+ .cmd = hmp_snapshot_drive,
+ },
+
+ {
+ .name = "delete-drive-snapshot",
+ .args_type = "device:s,name:s",
+ .params = "device name",
+ .help = "Delete internal snapshot.",
+ .cmd = hmp_delete_drive_snapshot,
+ },
+
+ {
+ .name = "savevm-end",
+ .args_type = "",
+ .params = "",
@ -107,21 +94,21 @@ index 673e39a697..039be0033d 100644
+ .coroutine = true,
+ },
diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index e72083b117..c846d37806 100644
index 9e4dcaaa75..2581730d74 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -61,4 +61,6 @@ bool delete_snapshot(const char *name,
bool has_devices, strList *devices,
Error **errp);
@@ -68,4 +68,6 @@ bool delete_snapshot(const char *name,
*/
void load_snapshot_resume(RunState state);
+int load_snapshot_from_blockdev(const char *filename, Error **errp);
+
#endif
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index dfbc0c9a2f..440f86aba8 100644
index 13f9a2dedb..7a7def7530 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -27,6 +27,7 @@ void hmp_info_status(Monitor *mon, const QDict *qdict);
@@ -28,6 +28,7 @@ void hmp_info_status(Monitor *mon, const QDict *qdict);
void hmp_info_uuid(Monitor *mon, const QDict *qdict);
void hmp_info_chardev(Monitor *mon, const QDict *qdict);
void hmp_info_mice(Monitor *mon, const QDict *qdict);
@ -129,38 +116,38 @@ index dfbc0c9a2f..440f86aba8 100644
void hmp_info_migrate(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict);
@@ -81,6 +82,10 @@ void hmp_netdev_add(Monitor *mon, const QDict *qdict);
void hmp_netdev_del(Monitor *mon, const QDict *qdict);
void hmp_getfd(Monitor *mon, const QDict *qdict);
void hmp_closefd(Monitor *mon, const QDict *qdict);
@@ -94,6 +95,8 @@ void hmp_closefd(Monitor *mon, const QDict *qdict);
void hmp_mouse_move(Monitor *mon, const QDict *qdict);
void hmp_mouse_button(Monitor *mon, const QDict *qdict);
void hmp_mouse_set(Monitor *mon, const QDict *qdict);
+void hmp_savevm_start(Monitor *mon, const QDict *qdict);
+void hmp_snapshot_drive(Monitor *mon, const QDict *qdict);
+void hmp_delete_drive_snapshot(Monitor *mon, const QDict *qdict);
+void hmp_savevm_end(Monitor *mon, const QDict *qdict);
void hmp_sendkey(Monitor *mon, const QDict *qdict);
void coroutine_fn hmp_screendump(Monitor *mon, const QDict *qdict);
void hmp_chardev_add(Monitor *mon, const QDict *qdict);
diff --git a/migration/meson.build b/migration/meson.build
index 8cac83c06c..0842d00cd2 100644
index 95d1cf2250..800f12a60d 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -24,6 +24,7 @@ softmmu_ss.add(files(
'multifd-zlib.c',
@@ -28,6 +28,7 @@ system_ss.add(files(
'options.c',
'postcopy-ram.c',
'savevm.c',
+ 'savevm-async.c',
'socket.c',
'tls.c',
), gnutls)
'threadinfo.c',
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
new file mode 100644
index 0000000000..05d394c0e2
index 0000000000..779e4e2a78
--- /dev/null
+++ b/migration/savevm-async.c
@@ -0,0 +1,531 @@
+#include "qemu/osdep.h"
+#include "migration/channel-savevm-async.h"
+#include "migration/migration.h"
+#include "migration/migration-stats.h"
+#include "migration/options.h"
+#include "migration/savevm.h"
+#include "migration/snapshot.h"
+#include "migration/global_state.h"
@ -180,6 +167,7 @@ index 0000000000..05d394c0e2
+#include "qemu/timer.h"
+#include "qemu/main-loop.h"
+#include "qemu/rcu.h"
+#include "qemu/yank.h"
+
+/* #define DEBUG_SAVEVM_STATE */
+
@ -230,24 +218,20 @@ index 0000000000..05d394c0e2
+ info->bytes = s->bs_pos;
+ switch (s->state) {
+ case SAVE_STATE_ERROR:
+ info->has_status = true;
+ info->status = g_strdup("failed");
+ info->has_total_time = true;
+ info->total_time = s->total_time;
+ if (s->error) {
+ info->has_error = true;
+ info->error = g_strdup(error_get_pretty(s->error));
+ }
+ break;
+ case SAVE_STATE_ACTIVE:
+ info->has_status = true;
+ info->status = g_strdup("active");
+ info->has_total_time = true;
+ info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+ - s->total_time;
+ break;
+ case SAVE_STATE_COMPLETED:
+ info->has_status = true;
+ info->status = g_strdup("completed");
+ info->has_total_time = true;
+ info->total_time = s->total_time;
@ -293,7 +277,7 @@ index 0000000000..05d394c0e2
+ return ret;
+}
+
+static void save_snapshot_error(const char *fmt, ...)
+static void G_GNUC_PRINTF(1, 2) save_snapshot_error(const char *fmt, ...)
+{
+ va_list ap;
+ char *msg;
@ -316,7 +300,6 @@ index 0000000000..05d394c0e2
+static void process_savevm_finalize(void *opaque)
+{
+ int ret;
+ AioContext *iohandler_ctx = iohandler_get_aio_context();
+ MigrationState *ms = migrate_get_current();
+
+ bool aborted = savevm_aborted();
@ -333,9 +316,7 @@ index 0000000000..05d394c0e2
+ * so move it back. It can stay in the main context and live out its live
+ * there, since we're done with it after this method ends anyway.
+ */
+ aio_context_acquire(iohandler_ctx);
+ blk_set_aio_context(snap_state.target, qemu_get_aio_context(), NULL);
+ aio_context_release(iohandler_ctx);
+
+ ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+ if (ret < 0) {
@ -347,7 +328,7 @@ index 0000000000..05d394c0e2
+ (void)qemu_savevm_state_complete_precopy(snap_state.file, false, false);
+ ret = qemu_file_get_error(snap_state.file);
+ if (ret < 0) {
+ save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
+ save_snapshot_error("qemu_savevm_state_complete_precopy error %d", ret);
+ }
+ }
+
@ -404,18 +385,32 @@ index 0000000000..05d394c0e2
+ }
+
+ while (snap_state.state == SAVE_STATE_ACTIVE) {
+ uint64_t pending_size, pend_precopy, pend_compatible, pend_postcopy;
+ uint64_t pending_size, pend_precopy, pend_postcopy;
+ uint64_t threshold = 400 * 1000;
+
+ /* pending is expected to be called without iothread lock */
+ qemu_mutex_unlock_iothread();
+ qemu_savevm_state_pending(snap_state.file, 0, &pend_precopy, &pend_compatible, &pend_postcopy);
+ qemu_mutex_lock_iothread();
+ /*
+ * pending_{estimate,exact} are expected to be called without iothread
+ * lock. Similar to what is done in migration.c, call the exact variant
+ * only once pend_precopy in the estimate is below the threshold.
+ */
+ bql_unlock();
+ qemu_savevm_state_pending_estimate(&pend_precopy, &pend_postcopy);
+ if (pend_precopy <= threshold) {
+ qemu_savevm_state_pending_exact(&pend_precopy, &pend_postcopy);
+ }
+ bql_lock();
+ pending_size = pend_precopy + pend_postcopy;
+
+ pending_size = pend_precopy + pend_compatible + pend_postcopy;
+ /*
+ * A guest reaching this cutoff is dirtying lots of RAM. It should be
+ * large enough so that the guest can't dirty this much between the
+ * check and the guest actually being stopped, but it should be small
+ * enough to avoid long downtimes for non-hibernation snapshots.
+ */
+ maxlen = blk_getlength(snap_state.target) - 100*1024*1024;
+
+ maxlen = blk_getlength(snap_state.target) - 30*1024*1024;
+
+ if (pending_size > 400000 && snap_state.bs_pos + pending_size < maxlen) {
+ /* Note that there is no progress for pend_postcopy when iterating */
+ if (pend_precopy > threshold && snap_state.bs_pos + pending_size < maxlen) {
+ ret = qemu_savevm_state_iterate(snap_state.file, false);
+ if (ret < 0) {
+ save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
@ -424,11 +419,7 @@ index 0000000000..05d394c0e2
+ DPRINTF("savevm iterate pending size %lu ret %d\n", pending_size, ret);
+ } else {
+ qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
+ ret = global_state_store();
+ if (ret) {
+ save_snapshot_error("global_state_store error %d", ret);
+ break;
+ }
+ global_state_store();
+
+ DPRINTF("savevm iterate complete\n");
+ break;
@ -447,19 +438,25 @@ index 0000000000..05d394c0e2
+ * so move there now and after every flush.
+ */
+ aio_co_reschedule_self(qemu_get_aio_context());
+ for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+ bdrv_graph_co_rdlock();
+ bs = bdrv_first(&it);
+ bdrv_graph_co_rdunlock();
+ while (bs) {
+ /* target has BDRV_O_NO_FLUSH, no sense calling bdrv_flush on it */
+ if (bs == blk_bs(snap_state.target)) {
+ continue;
+ }
+
+ AioContext *bs_ctx = bdrv_get_aio_context(bs);
+ if (bs_ctx != qemu_get_aio_context()) {
+ DPRINTF("savevm: async flushing drive %s\n", bs->filename);
+ aio_co_reschedule_self(bs_ctx);
+ bdrv_flush(bs);
+ aio_co_reschedule_self(qemu_get_aio_context());
+ if (bs != blk_bs(snap_state.target)) {
+ AioContext *bs_ctx = bdrv_get_aio_context(bs);
+ if (bs_ctx != qemu_get_aio_context()) {
+ DPRINTF("savevm: async flushing drive %s\n", bs->filename);
+ aio_co_reschedule_self(bs_ctx);
+ bdrv_graph_co_rdlock();
+ bdrv_flush(bs);
+ bdrv_graph_co_rdunlock();
+ aio_co_reschedule_self(qemu_get_aio_context());
+ }
+ }
+ bdrv_graph_co_rdlock();
+ bs = bdrv_next(&it);
+ bdrv_graph_co_rdunlock();
+ }
+
+ DPRINTF("timing: async flushing took %ld ms\n",
@ -468,7 +465,7 @@ index 0000000000..05d394c0e2
+ qemu_bh_schedule(snap_state.finalize_bh);
+}
+
+void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
+void qmp_savevm_start(const char *statefile, Error **errp)
+{
+ Error *local_err = NULL;
+ MigrationState *ms = migrate_get_current();
@ -482,12 +479,12 @@ index 0000000000..05d394c0e2
+ return;
+ }
+
+ if (migration_is_running(ms->state)) {
+ if (migration_is_running()) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, QERR_MIGRATION_ACTIVE);
+ return;
+ }
+
+ if (migrate_use_block()) {
+ if (migrate_block()) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "Block migration and snapshots are incompatible");
+ return;
@ -505,7 +502,7 @@ index 0000000000..05d394c0e2
+ snap_state.error = NULL;
+ }
+
+ if (!has_statefile) {
+ if (!statefile) {
+ vm_stop(RUN_STATE_SAVE_VM);
+ snap_state.state = SAVE_STATE_COMPLETED;
+ return;
@ -539,8 +536,10 @@ index 0000000000..05d394c0e2
+ * State is cleared in process_savevm_co, but has to be initialized
+ * here (blocking main thread, from QMP) to avoid race conditions.
+ */
+ migrate_init(ms);
+ memset(&ram_counters, 0, sizeof(ram_counters));
+ if (migrate_init(ms, errp)) {
+ return;
+ }
+ memset(&mig_stats, 0, sizeof(mig_stats));
+ ms->to_dst_file = snap_state.file;
+
+ error_setg(&snap_state.blocker, "block device is in use by savevm");
@ -549,10 +548,8 @@ index 0000000000..05d394c0e2
+ snap_state.state = SAVE_STATE_ACTIVE;
+ snap_state.finalize_bh = qemu_bh_new(process_savevm_finalize, &snap_state);
+ snap_state.co = qemu_coroutine_create(&process_savevm_co, NULL);
+ qemu_mutex_unlock_iothread();
+ qemu_savevm_state_header(snap_state.file);
+ qemu_savevm_state_setup(snap_state.file);
+ qemu_mutex_lock_iothread();
+
+ /* Async processing from here on out happens in iohandler context, so let
+ * the target bdrv have its home there.
@ -623,22 +620,6 @@ index 0000000000..05d394c0e2
+ DPRINTF("savevm-end: cleanup done\n");
+}
+
+// FIXME: Deprecated
+void qmp_snapshot_drive(const char *device, const char *name, Error **errp)
+{
+ // Compatibility to older qemu-server.
+ qmp_blockdev_snapshot_internal_sync(device, name, errp);
+}
+
+// FIXME: Deprecated
+void qmp_delete_drive_snapshot(const char *device, const char *name,
+ Error **errp)
+{
+ // Compatibility to older qemu-server.
+ (void)qmp_blockdev_snapshot_delete_internal_sync(device, false, NULL,
+ true, name, errp);
+}
+
+int load_snapshot_from_blockdev(const char *filename, Error **errp)
+{
+ BlockBackend *be;
@ -673,6 +654,10 @@ index 0000000000..05d394c0e2
+ dirty_bitmap_mig_before_vm_start();
+
+ qemu_fclose(f);
+
+ /* state_destroy assumes a real migration which would have added a yank */
+ yank_register_instance(MIGRATION_YANK_INSTANCE, &error_abort);
+
+ migration_incoming_state_destroy();
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Error while loading VM state");
@ -690,39 +675,28 @@ index 0000000000..05d394c0e2
+ return ret;
+}
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 480b798963..cfebfd1db5 100644
index 871898ac46..ef4634e5c1 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1906,6 +1906,63 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, err);
}
@@ -22,6 +22,7 @@
#include "monitor/monitor-internal.h"
#include "qapi/error.h"
#include "qapi/qapi-commands-control.h"
+#include "qapi/qapi-commands-migration.h"
#include "qapi/qapi-commands-misc.h"
#include "qapi/qmp/qdict.h"
#include "qemu/cutils.h"
@@ -443,3 +444,40 @@ void hmp_info_mtree(Monitor *mon, const QDict *qdict)
mtree_info(flatview, dispatch_tree, owner, disabled);
}
+
+void hmp_savevm_start(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *statefile = qdict_get_try_str(qdict, "statefile");
+
+ qmp_savevm_start(statefile != NULL, statefile, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_snapshot_drive(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *name = qdict_get_str(qdict, "name");
+ const char *device = qdict_get_str(qdict, "device");
+
+ qmp_snapshot_drive(device, name, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_delete_drive_snapshot(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *name = qdict_get_str(qdict, "name");
+ const char *device = qdict_get_str(qdict, "device");
+
+ qmp_delete_drive_snapshot(device, name, &errp);
+ qmp_savevm_start(statefile, &errp);
+ hmp_handle_error(mon, errp);
+}
+
@ -739,7 +713,7 @@ index 480b798963..cfebfd1db5 100644
+ SaveVMInfo *info;
+ info = qmp_query_savevm(NULL);
+
+ if (info->has_status) {
+ if (info->status) {
+ monitor_printf(mon, "savevm status: %s\n", info->status);
+ monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
+ info->total_time);
@ -749,21 +723,17 @@ index 480b798963..cfebfd1db5 100644
+ if (info->has_bytes) {
+ monitor_printf(mon, "Bytes saved: %"PRIu64"\n", info->bytes);
+ }
+ if (info->has_error) {
+ if (info->error) {
+ monitor_printf(mon, "Error: %s\n", info->error);
+ }
+}
+
void hmp_info_iothreads(Monitor *mon, const QDict *qdict)
{
IOThreadInfoList *info_list = qmp_query_iothreads(NULL);
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..4435866379 100644
index 8c65b90328..ed20d066cd 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -261,6 +261,40 @@
'*compression': 'CompressionStats',
'*socket-address': ['SocketAddress'] } }
@@ -297,6 +297,40 @@
'*dirty-limit-throttle-time-per-round': 'uint64',
'*dirty-limit-ring-full-time': 'uint64'} }
+##
+# @SaveVMInfo:
@ -803,10 +773,10 @@ index 88ecf86ac8..4435866379 100644
# @query-migrate:
#
diff --git a/qapi/misc.json b/qapi/misc.json
index 27ef5a2b20..b3ce75dcae 100644
index ec30e5c570..7147199a12 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -435,6 +435,38 @@
@@ -454,6 +454,24 @@
##
{ 'command': 'query-fdsets', 'returns': ['FdsetInfo'] }
@ -815,26 +785,12 @@ index 27ef5a2b20..b3ce75dcae 100644
+#
+# Prepare for snapshot and halt VM. Save VM state to statefile.
+#
+# @statefile: target file that state should be written to.
+#
+##
+{ 'command': 'savevm-start', 'data': { '*statefile': 'str' } }
+
+##
+# @snapshot-drive:
+#
+# Create an internal drive snapshot.
+#
+##
+{ 'command': 'snapshot-drive', 'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @delete-drive-snapshot:
+#
+# Delete a drive snapshot.
+#
+##
+{ 'command': 'delete-drive-snapshot', 'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @savevm-end:
+#
+# Resume VM after a snapshot.
@ -846,10 +802,10 @@ index 27ef5a2b20..b3ce75dcae 100644
# @CommandLineParameterType:
#
diff --git a/qemu-options.hx b/qemu-options.hx
index 7f99d15b23..54efb127c4 100644
index 8ce85d4559..511ab9415e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4391,6 +4391,18 @@ SRST
@@ -4610,6 +4610,18 @@ SRST
Start right away with a saved state (``loadvm`` in monitor)
ERST
@ -868,11 +824,11 @@ index 7f99d15b23..54efb127c4 100644
#ifndef _WIN32
DEF("daemonize", 0, QEMU_OPTION_daemonize, \
"-daemonize daemonize QEMU after initializing\n", QEMU_ARCH_ALL)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 5f7f6ca981..21f067d115 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -164,6 +164,7 @@ static const char *accelerators;
diff --git a/system/vl.c b/system/vl.c
index c644222982..2738ab7c91 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -163,6 +163,7 @@ static const char *accelerators;
static bool have_custom_ram_size;
static const char *ram_memdev_id;
static QDict *machine_opts_dict;
@ -880,10 +836,10 @@ index 5f7f6ca981..21f067d115 100644
static QTAILQ_HEAD(, ObjectOption) object_opts = QTAILQ_HEAD_INITIALIZER(object_opts);
static QTAILQ_HEAD(, DeviceOption) device_opts = QTAILQ_HEAD_INITIALIZER(device_opts);
static int display_remote;
@@ -2607,6 +2608,12 @@ void qmp_x_exit_preconfig(Error **errp)
if (loadvm) {
@@ -2712,6 +2713,12 @@ void qmp_x_exit_preconfig(Error **errp)
RunState state = autostart ? RUN_STATE_RUNNING : runstate_get();
load_snapshot(loadvm, NULL, false, NULL, &error_fatal);
load_snapshot_resume(state);
+ } else if (loadstate) {
+ Error *local_err = NULL;
+ if (load_snapshot_from_blockdev(loadstate, &local_err) < 0) {
@ -893,7 +849,7 @@ index 5f7f6ca981..21f067d115 100644
}
if (replay_mode != REPLAY_MODE_NONE) {
replay_vmstate_init();
@@ -3151,6 +3158,9 @@ void qemu_init(int argc, char **argv)
@@ -3259,6 +3266,9 @@ void qemu_init(int argc, char **argv)
case QEMU_OPTION_loadvm:
loadvm = optarg;
break;

View File

@ -13,18 +13,18 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to removal of QEMUFileOps]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
migration/qemu-file.c | 49 +++++++++++++++++++++++++++-------------
migration/qemu-file.c | 50 +++++++++++++++++++++++++++-------------
migration/qemu-file.h | 2 ++
migration/savevm-async.c | 5 ++--
3 files changed, 38 insertions(+), 18 deletions(-)
3 files changed, 39 insertions(+), 18 deletions(-)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2d5f74ffc2..9fd97e6fe1 100644
index a10882d47f..19c1de0472 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -31,8 +31,8 @@
#include "trace.h"
#include "qapi/error.h"
@@ -35,8 +35,8 @@
#include "rdma.h"
#include "io/channel-file.h"
-#define IO_BUF_SIZE 32768
-#define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
@ -32,8 +32,8 @@ index 2d5f74ffc2..9fd97e6fe1 100644
+#define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 256)
struct QEMUFile {
const QEMUFileHooks *hooks;
@@ -55,7 +55,8 @@ struct QEMUFile {
QIOChannel *ioc;
@@ -44,7 +44,8 @@ struct QEMUFile {
int buf_index;
int buf_size; /* 0 when writing */
@ -43,8 +43,8 @@ index 2d5f74ffc2..9fd97e6fe1 100644
DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
struct iovec iov[MAX_IOV_SIZE];
@@ -127,7 +128,9 @@ bool qemu_file_mode_is_not_valid(const char *mode)
return false;
@@ -101,7 +102,9 @@ int qemu_file_shutdown(QEMUFile *f)
return 0;
}
-static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
@ -54,7 +54,7 @@ index 2d5f74ffc2..9fd97e6fe1 100644
{
QEMUFile *f;
@@ -136,6 +139,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
@@ -110,6 +113,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
object_ref(ioc);
f->ioc = ioc;
f->is_writable = is_writable;
@ -63,7 +63,7 @@ index 2d5f74ffc2..9fd97e6fe1 100644
return f;
}
@@ -146,17 +151,27 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
@@ -120,17 +125,27 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
*/
QEMUFile *qemu_file_get_return_path(QEMUFile *f)
{
@ -93,8 +93,8 @@ index 2d5f74ffc2..9fd97e6fe1 100644
+ return qemu_file_new_impl(ioc, false, buffer_size);
}
void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
@@ -414,7 +429,7 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
/*
@@ -328,7 +343,7 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
do {
len = qio_channel_read(f->ioc,
(char *)f->buf + pending,
@ -103,16 +103,17 @@ index 2d5f74ffc2..9fd97e6fe1 100644
&local_error);
if (len == QIO_CHANNEL_ERR_BLOCK) {
if (qemu_in_coroutine()) {
@@ -464,6 +479,8 @@ int qemu_fclose(QEMUFile *f)
@@ -368,6 +383,9 @@ int qemu_fclose(QEMUFile *f)
ret = ret2;
}
g_clear_pointer(&f->ioc, object_unref);
+
+ free(f->buf);
+
/* If any error was spotted before closing, we should report it
* instead of the close() return value.
*/
@@ -518,7 +535,7 @@ static void add_buf_to_iovec(QEMUFile *f, size_t len)
error_free(f->last_error_obj);
g_free(f);
trace_qemu_file_fclose();
@@ -416,7 +434,7 @@ static void add_buf_to_iovec(QEMUFile *f, size_t len)
{
if (!add_to_iovec(f, f->buf + f->buf_index, len, false)) {
f->buf_index += len;
@ -121,7 +122,7 @@ index 2d5f74ffc2..9fd97e6fe1 100644
qemu_fflush(f);
}
}
@@ -544,7 +561,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
@@ -441,7 +459,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
}
while (size > 0) {
@ -130,7 +131,7 @@ index 2d5f74ffc2..9fd97e6fe1 100644
if (l > size) {
l = size;
}
@@ -591,8 +608,8 @@ size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset)
@@ -587,8 +605,8 @@ size_t coroutine_mixed_fn qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t si
size_t index;
assert(!qemu_file_is_writable(f));
@ -141,7 +142,7 @@ index 2d5f74ffc2..9fd97e6fe1 100644
/* The 1st byte to read from */
index = f->buf_index + offset;
@@ -642,7 +659,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
@@ -638,7 +656,7 @@ size_t coroutine_mixed_fn qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size
size_t res;
uint8_t *src;
@ -150,16 +151,16 @@ index 2d5f74ffc2..9fd97e6fe1 100644
if (res == 0) {
return done;
}
@@ -676,7 +693,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
@@ -672,7 +690,7 @@ size_t coroutine_mixed_fn qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size
*/
size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
size_t coroutine_mixed_fn qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
{
- if (size < IO_BUF_SIZE) {
+ if (size < f->buf_allocated_size) {
size_t res;
uint8_t *src = NULL;
@@ -701,7 +718,7 @@ int qemu_peek_byte(QEMUFile *f, int offset)
@@ -697,7 +715,7 @@ int coroutine_mixed_fn qemu_peek_byte(QEMUFile *f, int offset)
int index = f->buf_index + offset;
assert(!qemu_file_is_writable(f));
@ -168,7 +169,7 @@ index 2d5f74ffc2..9fd97e6fe1 100644
if (index >= f->buf_size) {
qemu_fill_buffer(f);
@@ -853,7 +870,7 @@ static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
@@ -811,7 +829,7 @@ static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
const uint8_t *p, size_t size)
{
@ -178,24 +179,24 @@ index 2d5f74ffc2..9fd97e6fe1 100644
if (blen < compressBound(size)) {
return -1;
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index fa13d04d78..914f1a63a8 100644
index 32fd4a34fd..36a0cd8cc8 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -63,7 +63,9 @@ typedef struct QEMUFileHooks {
} QEMUFileHooks;
@@ -30,7 +30,9 @@
#include "io/channel.h"
QEMUFile *qemu_file_new_input(QIOChannel *ioc);
+QEMUFile *qemu_file_new_input_sized(QIOChannel *ioc, size_t buffer_size);
QEMUFile *qemu_file_new_output(QIOChannel *ioc);
+QEMUFile *qemu_file_new_output_sized(QIOChannel *ioc, size_t buffer_size);
void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
int qemu_fclose(QEMUFile *f);
/*
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 05d394c0e2..bafe6ae5eb 100644
index 779e4e2a78..bf36fc06d2 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -367,7 +367,7 @@ void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
@@ -379,7 +379,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
QIOChannel *ioc = QIO_CHANNEL(qio_channel_savevm_async_new(snap_state.target,
&snap_state.bs_pos));
@ -204,7 +205,7 @@ index 05d394c0e2..bafe6ae5eb 100644
if (!snap_state.file) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
@@ -500,7 +500,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
@@ -496,7 +496,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
blk_op_block_all(be, blocker);
/* restore the VM state */

View File

@ -4,32 +4,33 @@ Date: Mon, 6 Apr 2020 12:16:47 +0200
Subject: [PATCH] PVE: block: add the zeroinit block driver filter
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[adapt to changed function signatures]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
[FE: adapt to changed function signatures
adhere to block graph lock requirements]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/meson.build | 1 +
block/zeroinit.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 199 insertions(+)
block/zeroinit.c | 214 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 215 insertions(+)
create mode 100644 block/zeroinit.c
diff --git a/block/meson.build b/block/meson.build
index b7c68b83a3..020a89ae07 100644
index e1f03fd773..b530e117b5 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -43,6 +43,7 @@ block_ss.add(files(
'vmdk.c',
'vpc.c',
@@ -39,6 +39,7 @@ block_ss.add(files(
'throttle.c',
'throttle-groups.c',
'write-threshold.c',
+ 'zeroinit.c',
), zstd, zlib, gnutls)
softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
diff --git a/block/zeroinit.c b/block/zeroinit.c
new file mode 100644
index 0000000000..b60e1b84dc
index 0000000000..696558d8d6
--- /dev/null
+++ b/block/zeroinit.c
@@ -0,0 +1,198 @@
@@ -0,0 +1,214 @@
+/*
+ * Filter to fake a zero-initialized block device.
+ *
@ -43,6 +44,8 @@ index 0000000000..b60e1b84dc
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/block-io.h"
+#include "block/graph-lock.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "qemu/cutils.h"
@ -93,6 +96,7 @@ index 0000000000..b60e1b84dc
+ Error **errp)
+{
+ BDRVZeroinitState *s = bs->opaque;
+ BdrvChild *file = NULL;
+ QemuOpts *opts;
+ Error *local_err = NULL;
+ int ret;
@ -108,10 +112,13 @@ index 0000000000..b60e1b84dc
+ }
+
+ /* Open the raw file */
+ bs->file = bdrv_open_child(qemu_opt_get(opts, "x-next"), options, "next",
+ bs, &child_of_bds,
+ BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+ false, &local_err);
+ file = bdrv_open_child(qemu_opt_get(opts, "x-next"), options, "next", bs,
+ &child_of_bds,
+ BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, false,
+ &local_err);
+ bdrv_graph_wrlock();
+ bs->file = file;
+ bdrv_graph_wrunlock();
+ if (local_err) {
+ ret = -EINVAL;
+ error_propagate(errp, local_err);
@ -124,7 +131,9 @@ index 0000000000..b60e1b84dc
+ ret = 0;
+fail:
+ if (ret < 0) {
+ bdrv_graph_wrlock();
+ bdrv_unref_child(bs, bs->file);
+ bdrv_graph_wrunlock();
+ }
+ qemu_opts_del(opts);
+ return ret;
@ -136,19 +145,22 @@ index 0000000000..b60e1b84dc
+ (void)s;
+}
+
+static int64_t zeroinit_getlength(BlockDriverState *bs)
+static coroutine_fn int64_t GRAPH_RDLOCK
+zeroinit_co_getlength(BlockDriverState *bs)
+{
+ return bdrv_getlength(bs->file->bs);
+ return bdrv_co_getlength(bs->file->bs);
+}
+
+static int coroutine_fn zeroinit_co_preadv(BlockDriverState *bs,
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
+static int coroutine_fn GRAPH_RDLOCK
+zeroinit_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn zeroinit_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+ int64_t bytes, BdrvRequestFlags flags)
+static int coroutine_fn GRAPH_RDLOCK
+zeroinit_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ BdrvRequestFlags flags)
+{
+ BDRVZeroinitState *s = bs->opaque;
+ if (offset >= s->extents)
@ -156,8 +168,9 @@ index 0000000000..b60e1b84dc
+ return bdrv_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static int coroutine_fn zeroinit_co_pwritev(BlockDriverState *bs,
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
+static int coroutine_fn GRAPH_RDLOCK
+zeroinit_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ BDRVZeroinitState *s = bs->opaque;
+ int64_t extents = offset + bytes;
@ -166,33 +179,37 @@ index 0000000000..b60e1b84dc
+ return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+}
+
+static coroutine_fn int zeroinit_co_flush(BlockDriverState *bs)
+static coroutine_fn int GRAPH_RDLOCK
+zeroinit_co_flush(BlockDriverState *bs)
+{
+ return bdrv_co_flush(bs->file->bs);
+}
+
+static int zeroinit_has_zero_init(BlockDriverState *bs)
+static int GRAPH_RDLOCK
+zeroinit_has_zero_init(BlockDriverState *bs)
+{
+ BDRVZeroinitState *s = bs->opaque;
+ return s->has_zero_init;
+}
+
+static int coroutine_fn zeroinit_co_pdiscard(BlockDriverState *bs,
+ int64_t offset, int64_t bytes)
+static int coroutine_fn GRAPH_RDLOCK
+zeroinit_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+ return bdrv_co_pdiscard(bs->file, offset, bytes);
+}
+
+static int zeroinit_co_truncate(BlockDriverState *bs, int64_t offset,
+ _Bool exact, PreallocMode prealloc,
+ BdrvRequestFlags req_flags, Error **errp)
+static int GRAPH_RDLOCK
+zeroinit_co_truncate(BlockDriverState *bs, int64_t offset, _Bool exact,
+ PreallocMode prealloc, BdrvRequestFlags req_flags,
+ Error **errp)
+{
+ return bdrv_co_truncate(bs->file, offset, exact, prealloc, req_flags, errp);
+}
+
+static int zeroinit_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+static coroutine_fn int GRAPH_RDLOCK
+zeroinit_co_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+ return bdrv_get_info(bs->file->bs, bdi);
+ return bdrv_co_get_info(bs->file->bs, bdi);
+}
+
+static BlockDriver bdrv_zeroinit = {
@ -203,7 +220,7 @@ index 0000000000..b60e1b84dc
+ .bdrv_parse_filename = zeroinit_parse_filename,
+ .bdrv_file_open = zeroinit_open,
+ .bdrv_close = zeroinit_close,
+ .bdrv_getlength = zeroinit_getlength,
+ .bdrv_co_getlength = zeroinit_co_getlength,
+ .bdrv_child_perm = bdrv_default_perms,
+ .bdrv_co_flush_to_disk = zeroinit_co_flush,
+
@ -219,7 +236,7 @@ index 0000000000..b60e1b84dc
+ .bdrv_co_pdiscard = zeroinit_co_pdiscard,
+
+ .bdrv_co_truncate = zeroinit_co_truncate,
+ .bdrv_get_info = zeroinit_get_info,
+ .bdrv_co_get_info = zeroinit_co_get_info,
+};
+
+static void bdrv_zeroinit_init(void)

View File

@ -10,14 +10,14 @@ Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
qemu-options.hx | 3 +++
softmmu/vl.c | 8 ++++++++
system/vl.c | 8 ++++++++
2 files changed, 11 insertions(+)
diff --git a/qemu-options.hx b/qemu-options.hx
index 54efb127c4..ef456d03ec 100644
index 511ab9415e..92e301d545 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1147,6 +1147,9 @@ backend describes how QEMU handles the data.
@@ -1237,6 +1237,9 @@ legacy PC, they are not recommended for modern configurations.
ERST
@ -27,11 +27,11 @@ index 54efb127c4..ef456d03ec 100644
DEF("fda", HAS_ARG, QEMU_OPTION_fda,
"-fda/-fdb file use 'file' as floppy disk 0/1 image\n", QEMU_ARCH_ALL)
DEF("fdb", HAS_ARG, QEMU_OPTION_fdb, "", QEMU_ARCH_ALL)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 21f067d115..9d737e7914 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2643,6 +2643,7 @@ void qemu_init(int argc, char **argv)
diff --git a/system/vl.c b/system/vl.c
index 2738ab7c91..20ebf2c920 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2748,6 +2748,7 @@ void qemu_init(int argc, char **argv)
MachineClass *machine_class;
bool userconfig = true;
FILE *vmstate_dump_file = NULL;
@ -39,7 +39,7 @@ index 21f067d115..9d737e7914 100644
qemu_add_opts(&qemu_drive_opts);
qemu_add_drive_opts(&qemu_legacy_drive_opts);
@@ -3263,6 +3264,13 @@ void qemu_init(int argc, char **argv)
@@ -3371,6 +3372,13 @@ void qemu_init(int argc, char **argv)
machine_parse_property_opt(qemu_find_opts("smp-opts"),
"smp", optarg);
break;
@ -50,6 +50,6 @@ index 21f067d115..9d737e7914 100644
+ exit(1);
+ }
+ break;
#ifdef CONFIG_VNC
case QEMU_OPTION_vnc:
vnc_parse(optarg);
break;

View File

@ -11,10 +11,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 9 insertions(+)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 2a20982066..7968ad5a93 100644
index d8fc1e2815..789694b8b3 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -278,6 +278,15 @@ static void apic_reset_common(DeviceState *dev)
@@ -263,6 +263,15 @@ static void apic_reset_common(DeviceState *dev)
info->vapic_base_update(s);
apic_init_reset(dev);

View File

@ -9,14 +9,14 @@ Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/file-posix.c | 59 ++++++++++++++++++++++++++++++--------------
qapi/block-core.json | 3 ++-
2 files changed, 42 insertions(+), 20 deletions(-)
qapi/block-core.json | 7 +++++-
2 files changed, 46 insertions(+), 20 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index 9a16d86344..bd68df57ad 100644
index 43bc0bd520..60e98c87f1 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2487,6 +2487,7 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
@@ -2876,6 +2876,7 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
int fd;
uint64_t perm, shared;
int result = 0;
@ -24,7 +24,7 @@ index 9a16d86344..bd68df57ad 100644
/* Validate options and set default values */
assert(options->driver == BLOCKDEV_DRIVER_FILE);
@@ -2527,19 +2528,22 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
@@ -2916,19 +2917,22 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
perm = BLK_PERM_WRITE | BLK_PERM_RESIZE;
shared = BLK_PERM_ALL & ~BLK_PERM_RESIZE;
@ -59,7 +59,7 @@ index 9a16d86344..bd68df57ad 100644
}
/* Clear the file by truncating it to 0 */
@@ -2593,13 +2597,15 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
@@ -2982,13 +2986,15 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
}
out_unlock:
@ -82,7 +82,7 @@ index 9a16d86344..bd68df57ad 100644
}
out_close:
@@ -2624,6 +2630,7 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
@@ -3012,6 +3018,7 @@ raw_co_create_opts(BlockDriver *drv, const char *filename,
PreallocMode prealloc;
char *buf = NULL;
Error *local_err = NULL;
@ -90,7 +90,7 @@ index 9a16d86344..bd68df57ad 100644
/* Skip file: protocol prefix */
strstart(filename, "file:", &filename);
@@ -2646,6 +2653,18 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
@@ -3034,6 +3041,18 @@ raw_co_create_opts(BlockDriver *drv, const char *filename,
return -EINVAL;
}
@ -109,7 +109,7 @@ index 9a16d86344..bd68df57ad 100644
options = (BlockdevCreateOptions) {
.driver = BLOCKDEV_DRIVER_FILE,
.u.file = {
@@ -2657,6 +2676,8 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
@@ -3045,6 +3064,8 @@ raw_co_create_opts(BlockDriver *drv, const char *filename,
.nocow = nocow,
.has_extent_size_hint = has_extent_size_hint,
.extent_size_hint = extent_size_hint,
@ -119,10 +119,21 @@ index 9a16d86344..bd68df57ad 100644
};
return raw_co_create(&options, errp);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7daaf545be..9e902b96bb 100644
index 45ab548dfe..f7c2b63c5d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4624,7 +4624,8 @@
@@ -4956,6 +4956,10 @@
# @extent-size-hint: Extent size hint to add to the image file; 0 for
# not adding an extent size hint (default: 1 MB, since 5.1)
#
+# @locking: whether to enable file locking. If set to 'auto', only
+# enable when Open File Descriptor (OFD) locking API is available
+# (default: auto).
+#
# Since: 2.12
##
{ 'struct': 'BlockdevCreateOptionsFile',
@@ -4963,7 +4967,8 @@
'size': 'size',
'*preallocation': 'PreallocMode',
'*nocow': 'bool',

View File

@ -18,10 +18,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/monitor/qmp.c b/monitor/qmp.c
index 6b8cfcf6d8..3ec67e32d3 100644
index 589c9524f8..2505dd658a 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -519,8 +519,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
@@ -536,8 +536,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
qemu_chr_fe_set_echo(&mon->common.chr, true);
/* Note: we run QMP monitor in I/O thread when @chr supports that */

View File

@ -26,10 +26,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 8d34caa31d..2df9037c4e 100644
index 4273de16a0..83f1fc0293 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -132,7 +132,8 @@ GlobalProperty hw_compat_4_0[] = {
@@ -162,7 +162,8 @@ GlobalProperty hw_compat_4_0[] = {
{ "virtio-vga", "edid", "false" },
{ "virtio-gpu-device", "edid", "false" },
{ "virtio-device", "use-started", "false" },

View File

@ -11,35 +11,36 @@ and only if 'is-current').
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to QAPI changes]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/core/machine-qmp-cmds.c | 6 ++++++
hw/core/machine-qmp-cmds.c | 5 +++++
include/hw/boards.h | 2 ++
qapi/machine.json | 4 +++-
softmmu/vl.c | 25 +++++++++++++++++++++++++
4 files changed, 36 insertions(+), 1 deletion(-)
system/vl.c | 25 +++++++++++++++++++++++++
4 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 76fff60a6b..ec9201fb9a 100644
index 314351cdff..628a3537c5 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -103,6 +103,12 @@ MachineInfoList *qmp_query_machines(Error **errp)
@@ -94,6 +94,11 @@ MachineInfoList *qmp_query_machines(Error **errp)
if (strcmp(mc->name, MACHINE_GET_CLASS(current_machine)->name) == 0) {
info->has_is_current = true;
info->is_current = true;
+
+ // PVE version string only exists for current machine
+ if (mc->pve_version) {
+ info->has_pve_version = true;
+ info->pve_version = g_strdup(mc->pve_version);
+ }
}
if (mc->default_cpu_type) {
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 90f1dd3aeb..14d60520d9 100644
index 8b8f6d5c00..dd6d0a1447 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -230,6 +230,8 @@ struct MachineClass {
@@ -246,6 +246,8 @@ struct MachineClass {
const char *desc;
const char *deprecation_reason;
@ -49,40 +50,40 @@ index 90f1dd3aeb..14d60520d9 100644
void (*reset)(MachineState *state, ShutdownCause reason);
void (*wakeup)(MachineState *state);
diff --git a/qapi/machine.json b/qapi/machine.json
index 9156103c8f..f4fb1b2c9c 100644
index a024d5b05d..1d69bffaa0 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -157,6 +157,8 @@
@@ -168,6 +168,8 @@
#
# @default-ram-id: the default ID of initial RAM memory backend (since 5.2)
# @acpi: machine type supports ACPI (since 8.0)
#
+# @pve-version: custom PVE version suffix specified as 'machine+pveN'
+#
# Since: 1.2
##
{ 'struct': 'MachineInfo',
@@ -164,7 +166,7 @@
@@ -175,7 +177,7 @@
'*is-default': 'bool', '*is-current': 'bool', 'cpu-max': 'int',
'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool',
'deprecated': 'bool', '*default-cpu-type': 'str',
- '*default-ram-id': 'str' } }
+ '*default-ram-id': 'str', '*pve-version': 'str' } }
- '*default-ram-id': 'str', 'acpi': 'bool' } }
+ '*default-ram-id': 'str', 'acpi': 'bool', '*pve-version': 'str' } }
##
# @query-machines:
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 9d737e7914..a64eee2fad 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1578,6 +1578,7 @@ static const QEMUOption *lookup_opt(int argc, char **argv,
diff --git a/system/vl.c b/system/vl.c
index 20ebf2c920..4d39e32097 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1659,6 +1659,7 @@ static const QEMUOption *lookup_opt(int argc, char **argv,
static MachineClass *select_machine(QDict *qdict, Error **errp)
{
const char *optarg = qdict_get_try_str(qdict, "type");
const char *machine_type = qdict_get_try_str(qdict, "type");
+ const char *pvever = qdict_get_try_str(qdict, "pvever");
GSList *machines = object_class_get_list(TYPE_MACHINE, false);
MachineClass *machine_class;
Error *local_err = NULL;
@@ -1595,6 +1596,11 @@ static MachineClass *select_machine(QDict *qdict, Error **errp)
@@ -1676,6 +1677,11 @@ static MachineClass *select_machine(QDict *qdict, Error **errp)
}
}
@ -94,7 +95,7 @@ index 9d737e7914..a64eee2fad 100644
g_slist_free(machines);
if (local_err) {
error_append_hint(&local_err, "Use -machine help to list supported machines\n");
@@ -3205,12 +3211,31 @@ void qemu_init(int argc, char **argv)
@@ -3313,12 +3319,31 @@ void qemu_init(int argc, char **argv)
case QEMU_OPTION_machine:
{
bool help;

View File

@ -25,7 +25,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/backup.c b/block/backup.c
index 6a9ad97a53..9b0151c5be 100644
index ec29d6b810..270957c0cd 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -237,8 +237,8 @@ static void backup_init_bcs_bitmap(BackupBlockJob *job)
@ -48,9 +48,9 @@ index 6a9ad97a53..9b0151c5be 100644
if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
int64_t offset = 0;
int64_t count;
@@ -492,6 +490,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
@@ -501,6 +499,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
&error_abort);
bdrv_graph_wrunlock();
+ backup_init_bcs_bitmap(job);
+

View File

@ -3,40 +3,46 @@ From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:57 +0200
Subject: [PATCH] PVE-Backup: add vma backup format code
Notes about partial restoring: skipping a certain drive is done via a
map line of the form skip=drive-scsi0. Since in PVE, most archives are
compressed and piped to vma for restore, it's not easily possible to
skip reads.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: create: register all streams before entering coroutines]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
[FE: improvements during create
allow partial restore]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/meson.build | 2 +
meson.build | 5 +
vma-reader.c | 859 ++++++++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 791 ++++++++++++++++++++++++++++++++++++++++++
vma.c | 849 +++++++++++++++++++++++++++++++++++++++++++++
vma-reader.c | 870 ++++++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 818 +++++++++++++++++++++++++++++++++++++++++
vma.c | 901 ++++++++++++++++++++++++++++++++++++++++++++++
vma.h | 150 ++++++++
6 files changed, 2656 insertions(+)
6 files changed, 2746 insertions(+)
create mode 100644 vma-reader.c
create mode 100644 vma-writer.c
create mode 100644 vma.c
create mode 100644 vma.h
diff --git a/block/meson.build b/block/meson.build
index 020a89ae07..4feae20e37 100644
index b530e117b5..b245daa98e 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -46,6 +46,8 @@ block_ss.add(files(
@@ -42,6 +42,8 @@ block_ss.add(files(
'zeroinit.c',
), zstd, zlib, gnutls)
+block_ss.add(files('../vma-writer.c'), libuuid)
+
softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
softmmu_ss.add(files('block-ram-registrar.c'))
system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(files('block-ram-registrar.c'))
diff --git a/meson.build b/meson.build
index 5c6b5a1c75..e8cf7e3d78 100644
index 91a0aa64c6..620cc594b2 100644
--- a/meson.build
+++ b/meson.build
@@ -1525,6 +1525,8 @@ keyutils = dependency('libkeyutils', required: false,
@@ -1922,6 +1922,8 @@ endif
has_gettid = cc.has_function('gettid')
@ -45,7 +51,7 @@ index 5c6b5a1c75..e8cf7e3d78 100644
# libselinux
selinux = dependency('libselinux',
required: get_option('selinux'),
@@ -3596,6 +3598,9 @@ if have_tools
@@ -4023,6 +4025,9 @@ if have_tools
dependencies: [blockdev, qemuutil, gnutls, selinux],
install: true)
@ -53,14 +59,14 @@ index 5c6b5a1c75..e8cf7e3d78 100644
+ dependencies: [authz, block, crypto, io, qom], install: true)
+
subdir('storage-daemon')
subdir('contrib/rdmacm-mux')
subdir('contrib/elf2dmp')
foreach exe: [ 'qemu-img', 'qemu-io', 'qemu-nbd', 'qemu-storage-daemon']
diff --git a/vma-reader.c b/vma-reader.c
new file mode 100644
index 0000000000..e65f1e8415
index 0000000000..d0b6721812
--- /dev/null
+++ b/vma-reader.c
@@ -0,0 +1,859 @@
@@ -0,0 +1,870 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@ -82,6 +88,7 @@ index 0000000000..e65f1e8415
+#include "qemu/ratelimit.h"
+#include "vma.h"
+#include "block/block.h"
+#include "block/graph-lock.h"
+#include "sysemu/block-backend.h"
+
+static unsigned char zero_vma_block[VMA_BLOCK_SIZE];
@ -91,6 +98,7 @@ index 0000000000..e65f1e8415
+ bool write_zeroes;
+ unsigned long *bitmap;
+ int bitmap_size;
+ bool skip;
+} VmaRestoreState;
+
+struct VmaReader {
@ -488,13 +496,14 @@ index 0000000000..e65f1e8415
+}
+
+static void allocate_rstate(VmaReader *vmar, guint8 dev_id,
+ BlockBackend *target, bool write_zeroes)
+ BlockBackend *target, bool write_zeroes, bool skip)
+{
+ assert(vmar);
+ assert(dev_id);
+
+ vmar->rstate[dev_id].target = target;
+ vmar->rstate[dev_id].write_zeroes = write_zeroes;
+ vmar->rstate[dev_id].skip = skip;
+
+ int64_t size = vmar->devinfo[dev_id].size;
+
@ -509,28 +518,30 @@ index 0000000000..e65f1e8415
+}
+
+int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id, BlockBackend *target,
+ bool write_zeroes, Error **errp)
+ bool write_zeroes, bool skip, Error **errp)
+{
+ assert(vmar);
+ assert(target != NULL);
+ assert(target != NULL || skip);
+ assert(dev_id);
+ assert(vmar->rstate[dev_id].target == NULL);
+ assert(vmar->rstate[dev_id].target == NULL && !vmar->rstate[dev_id].skip);
+
+ int64_t size = blk_getlength(target);
+ int64_t size_diff = size - vmar->devinfo[dev_id].size;
+ if (target != NULL) {
+ int64_t size = blk_getlength(target);
+ int64_t size_diff = size - vmar->devinfo[dev_id].size;
+
+ /* storage types can have different size restrictions, so it
+ * is not always possible to create an image with exact size.
+ * So we tolerate a size difference up to 4MB.
+ */
+ if ((size_diff < 0) || (size_diff > 4*1024*1024)) {
+ error_setg(errp, "vma_reader_register_bs for stream %s failed - "
+ "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
+ size, vmar->devinfo[dev_id].size);
+ return -1;
+ /* storage types can have different size restrictions, so it
+ * is not always possible to create an image with exact size.
+ * So we tolerate a size difference up to 4MB.
+ */
+ if ((size_diff < 0) || (size_diff > 4*1024*1024)) {
+ error_setg(errp, "vma_reader_register_bs for stream %s failed - "
+ "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
+ size, vmar->devinfo[dev_id].size);
+ return -1;
+ }
+ }
+
+ allocate_rstate(vmar, dev_id, target, write_zeroes);
+ allocate_rstate(vmar, dev_id, target, write_zeroes, skip);
+
+ return 0;
+}
@ -590,8 +601,10 @@ index 0000000000..e65f1e8415
+ } else {
+ int res = blk_pwrite(target, sector_num * BDRV_SECTOR_SIZE, nb_sectors * BDRV_SECTOR_SIZE, buf, 0);
+ if (res < 0) {
+ bdrv_graph_rdlock_main_loop();
+ error_setg(errp, "blk_pwrite to %s failed (%d)",
+ bdrv_get_device_name(blk_bs(target)), res);
+ bdrv_graph_rdunlock_main_loop();
+ return -1;
+ }
+ }
@ -623,19 +636,23 @@ index 0000000000..e65f1e8415
+ VmaRestoreState *rstate = &vmar->rstate[dev_id];
+ BlockBackend *target = NULL;
+
+ bool skip = rstate->skip;
+
+ if (dev_id != vmar->vmstate_stream) {
+ target = rstate->target;
+ if (!verify && !target) {
+ if (!verify && !target && !skip) {
+ error_setg(errp, "got wrong dev id %d", dev_id);
+ return -1;
+ }
+
+ if (vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "found duplicated cluster %zd for stream %s",
+ cluster_num, vmar->devinfo[dev_id].devname);
+ return -1;
+ if (!skip) {
+ if (vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "found duplicated cluster %zd for stream %s",
+ cluster_num, vmar->devinfo[dev_id].devname);
+ return -1;
+ }
+ vma_reader_set_bitmap(rstate, cluster_num, 1);
+ }
+ vma_reader_set_bitmap(rstate, cluster_num, 1);
+
+ max_sector = vmar->devinfo[dev_id].size/BDRV_SECTOR_SIZE;
+ } else {
@ -681,7 +698,7 @@ index 0000000000..e65f1e8415
+ return -1;
+ }
+
+ if (!verify) {
+ if (!verify && !skip) {
+ int nb_sectors = end_sector - sector_num;
+ if (restore_write_data(vmar, dev_id, target, vmstate_fd,
+ buf + start, sector_num, nb_sectors,
@ -717,7 +734,7 @@ index 0000000000..e65f1e8415
+ return -1;
+ }
+
+ if (!verify) {
+ if (!verify && !skip) {
+ int nb_sectors = end_sector - sector_num;
+ if (restore_write_data(vmar, dev_id, target, vmstate_fd,
+ buf + start, sector_num,
@ -742,7 +759,7 @@ index 0000000000..e65f1e8415
+ vmar->partial_zero_cluster_data += zero_size;
+ }
+
+ if (rstate->write_zeroes && !verify) {
+ if (rstate->write_zeroes && !verify && !skip) {
+ if (restore_write_data(vmar, dev_id, target, vmstate_fd,
+ zero_vma_block, sector_num,
+ nb_sectors, errp) < 0) {
@ -913,7 +930,7 @@ index 0000000000..e65f1e8415
+
+ for (dev_id = 1; dev_id < 255; dev_id++) {
+ if (vma_reader_get_device_info(vmar, dev_id)) {
+ allocate_rstate(vmar, dev_id, NULL, false);
+ allocate_rstate(vmar, dev_id, NULL, false, false);
+ }
+ }
+
@ -922,10 +939,10 @@ index 0000000000..e65f1e8415
+
diff --git a/vma-writer.c b/vma-writer.c
new file mode 100644
index 0000000000..df4b20793d
index 0000000000..126b296647
--- /dev/null
+++ b/vma-writer.c
@@ -0,0 +1,791 @@
@@ -0,0 +1,818 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@ -941,6 +958,8 @@ index 0000000000..df4b20793d
+
+#include "qemu/osdep.h"
+#include <glib.h>
+#include <linux/magic.h>
+#include <sys/vfs.h>
+#include <uuid/uuid.h>
+
+#include "vma.h"
@ -949,6 +968,7 @@ index 0000000000..df4b20793d
+#include "qemu/main-loop.h"
+#include "qemu/coroutine.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "qemu/memalign.h"
+
+#define DEBUG_VMA 0
@ -1132,10 +1152,10 @@ index 0000000000..df4b20793d
+{
+ assert(qemu_in_coroutine());
+ AioContext *ctx = qemu_get_current_aio_context();
+ aio_set_fd_handler(ctx, fd, false, NULL, (IOHandler *)qemu_coroutine_enter,
+ NULL, NULL, qemu_coroutine_self());
+ aio_set_fd_handler(ctx, fd, NULL, (IOHandler *)qemu_coroutine_enter, NULL,
+ NULL, qemu_coroutine_self());
+ qemu_coroutine_yield();
+ aio_set_fd_handler(ctx, fd, false, NULL, NULL, NULL, NULL, NULL);
+ aio_set_fd_handler(ctx, fd, NULL, NULL, NULL, NULL, NULL);
+}
+
+static ssize_t coroutine_fn
@ -1184,6 +1204,23 @@ index 0000000000..df4b20793d
+ return (done == bytes) ? bytes : -1;
+}
+
+static bool is_path_tmpfs(const char *path) {
+ struct statfs fs;
+ int ret;
+
+ do {
+ ret = statfs(path, &fs);
+ } while (ret != 0 && errno == EINTR);
+
+ if (ret != 0) {
+ warn_report("statfs call for %s failed, assuming not tmpfs - %s\n",
+ path, strerror(errno));
+ return false;
+ }
+
+ return fs.f_type == TMPFS_MAGIC;
+}
+
+VmaWriter *vma_writer_create(const char *filename, uuid_t uuid, Error **errp)
+{
+ const char *p;
@ -1233,12 +1270,19 @@ index 0000000000..df4b20793d
+ }
+ /* try to use O_NONBLOCK */
+ fcntl(vmaw->fd, F_SETFL, fcntl(vmaw->fd, F_GETFL)|O_NONBLOCK);
+ } else {
+ oflags = O_NONBLOCK|O_DIRECT|O_WRONLY|O_EXCL;
+ } else {
+ gchar *dirname = g_path_get_dirname(filename);
+ oflags = O_NONBLOCK|O_WRONLY|O_EXCL;
+ if (!is_path_tmpfs(dirname)) {
+ oflags |= O_DIRECT;
+ }
+ g_free(dirname);
+ vmaw->fd = qemu_create(filename, oflags, 0644, errp);
+ }
+
+ if (vmaw->fd < 0) {
+ error_free(*errp);
+ *errp = NULL;
+ error_setg(errp, "can't open file %s - %s\n", filename,
+ g_strerror(errno));
+ goto err;
@ -1719,10 +1763,10 @@ index 0000000000..df4b20793d
+}
diff --git a/vma.c b/vma.c
new file mode 100644
index 0000000000..e8dffb43e0
index 0000000000..bb715e9061
--- /dev/null
+++ b/vma.c
@@ -0,0 +1,849 @@
@@ -0,0 +1,901 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@ -1756,7 +1800,7 @@ index 0000000000..e8dffb43e0
+ "vma list <filename>\n"
+ "vma config <filename> [-c config]\n"
+ "vma create <filename> [-c config] pathname ...\n"
+ "vma extract <filename> [-r <fifo>] <targetdir>\n"
+ "vma extract <filename> [-d <drive-list>] [-r <fifo>] <targetdir>\n"
+ "vma verify <filename> [-v]\n"
+ ;
+
@ -1863,6 +1907,7 @@ index 0000000000..e8dffb43e0
+ char *throttling_group;
+ char *cache;
+ bool write_zero;
+ bool skip;
+} RestoreMap;
+
+static bool try_parse_option(char **line, const char *optname, char **out, const char *inbuf) {
@ -1900,9 +1945,10 @@ index 0000000000..e8dffb43e0
+ const char *filename;
+ const char *dirname;
+ const char *readmap = NULL;
+ gchar **drive_list = NULL;
+
+ for (;;) {
+ c = getopt(argc, argv, "hvr:");
+ c = getopt(argc, argv, "hvd:r:");
+ if (c == -1) {
+ break;
+ }
@ -1911,6 +1957,9 @@ index 0000000000..e8dffb43e0
+ case 'h':
+ help();
+ break;
+ case 'd':
+ drive_list = g_strsplit(optarg, ",", 254);
+ break;
+ case 'r':
+ readmap = optarg;
+ break;
@ -1970,74 +2019,89 @@ index 0000000000..e8dffb43e0
+ char *bps = NULL;
+ char *group = NULL;
+ char *cache = NULL;
+ char *devname = NULL;
+ bool skip = false;
+ uint64_t bps_value = 0;
+ const char *path = NULL;
+ bool write_zero = true;
+
+ if (!line || line[0] == '\0' || !strcmp(line, "done\n")) {
+ break;
+ }
+ int len = strlen(line);
+ if (line[len - 1] == '\n') {
+ line[len - 1] = '\0';
+ if (len == 1) {
+ len = len - 1;
+ if (len == 0) {
+ break;
+ }
+ }
+
+ while (1) {
+ if (!try_parse_option(&line, "format", &format, inbuf) &&
+ !try_parse_option(&line, "throttling.bps", &bps, inbuf) &&
+ !try_parse_option(&line, "throttling.group", &group, inbuf) &&
+ !try_parse_option(&line, "cache", &cache, inbuf))
+ {
+ break;
+ if (strncmp(line, "skip", 4) == 0) {
+ if (len < 6 || line[4] != '=') {
+ g_error("read map failed - option 'skip' has no value ('%s')",
+ inbuf);
+ } else {
+ devname = line + 5;
+ skip = true;
+ }
+ }
+
+ uint64_t bps_value = 0;
+ if (bps) {
+ bps_value = verify_u64(bps);
+ g_free(bps);
+ }
+
+ const char *path;
+ bool write_zero;
+ if (line[0] == '0' && line[1] == ':') {
+ path = line + 2;
+ write_zero = false;
+ } else if (line[0] == '1' && line[1] == ':') {
+ path = line + 2;
+ write_zero = true;
+ } else {
+ g_error("read map failed - parse error ('%s')", inbuf);
+ while (1) {
+ if (!try_parse_option(&line, "format", &format, inbuf) &&
+ !try_parse_option(&line, "throttling.bps", &bps, inbuf) &&
+ !try_parse_option(&line, "throttling.group", &group, inbuf) &&
+ !try_parse_option(&line, "cache", &cache, inbuf))
+ {
+ break;
+ }
+ }
+
+ if (bps) {
+ bps_value = verify_u64(bps);
+ g_free(bps);
+ }
+
+ if (line[0] == '0' && line[1] == ':') {
+ path = line + 2;
+ write_zero = false;
+ } else if (line[0] == '1' && line[1] == ':') {
+ path = line + 2;
+ write_zero = true;
+ } else {
+ g_error("read map failed - parse error ('%s')", inbuf);
+ }
+
+ path = extract_devname(path, &devname, -1);
+ }
+
+ char *devname = NULL;
+ path = extract_devname(path, &devname, -1);
+ if (!devname) {
+ g_error("read map failed - no dev name specified ('%s')",
+ inbuf);
+ }
+
+ RestoreMap *map = g_new0(RestoreMap, 1);
+ map->devname = g_strdup(devname);
+ map->path = g_strdup(path);
+ map->format = format;
+ map->throttling_bps = bps_value;
+ map->throttling_group = group;
+ map->cache = cache;
+ map->write_zero = write_zero;
+ RestoreMap *restore_map = g_new0(RestoreMap, 1);
+ restore_map->devname = g_strdup(devname);
+ restore_map->path = g_strdup(path);
+ restore_map->format = format;
+ restore_map->throttling_bps = bps_value;
+ restore_map->throttling_group = group;
+ restore_map->cache = cache;
+ restore_map->write_zero = write_zero;
+ restore_map->skip = skip;
+
+ g_hash_table_insert(devmap, map->devname, map);
+ g_hash_table_insert(devmap, restore_map->devname, restore_map);
+
+ };
+ }
+
+ int i;
+ int vmstate_fd = -1;
+ guint8 vmstate_stream = 0;
+ bool drive_rename_bitmap[255];
+ memset(drive_rename_bitmap, 0, sizeof(drive_rename_bitmap));
+
+ for (i = 1; i < 255; i++) {
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, i);
+ if (di && (strcmp(di->devname, "vmstate") == 0)) {
+ vmstate_stream = i;
+ char *statefn = g_strdup_printf("%s/vmstate.bin", dirname);
+ vmstate_fd = open(statefn, O_WRONLY|O_CREAT|O_EXCL, 0644);
+ if (vmstate_fd < 0) {
@ -2053,10 +2117,25 @@ index 0000000000..e8dffb43e0
+ const char *cache = NULL;
+ int flags = BDRV_O_RDWR;
+ bool write_zero = true;
+ bool skip = false;
+
+ BlockBackend *blk = NULL;
+
+ if (readmap) {
+ if (drive_list) {
+ skip = true;
+ int j;
+ for (j = 0; drive_list[j]; j++) {
+ if (strcmp(drive_list[j], di->devname) == 0) {
+ skip = false;
+ drive_rename_bitmap[i] = true;
+ break;
+ }
+ }
+ } else {
+ drive_rename_bitmap[i] = true;
+ }
+
+ if (!skip && readmap) {
+ RestoreMap *map;
+ map = (RestoreMap *)g_hash_table_lookup(devmap, di->devname);
+ if (map == NULL) {
@ -2068,7 +2147,8 @@ index 0000000000..e8dffb43e0
+ throttling_group = map->throttling_group;
+ cache = map->cache;
+ write_zero = map->write_zero;
+ } else {
+ skip = map->skip;
+ } else if (!skip) {
+ devfn = g_strdup_printf("%s/tmp-disk-%s.raw",
+ dirname, di->devname);
+ printf("DEVINFO %s %zd\n", devfn, di->size);
@ -2086,57 +2166,60 @@ index 0000000000..e8dffb43e0
+ write_zero = false;
+ }
+
+ size_t devlen = strlen(devfn);
+ QDict *options = NULL;
+ bool writethrough;
+ if (format) {
+ /* explicit format from commandline */
+ options = qdict_new();
+ qdict_put_str(options, "driver", format);
+ } else if ((devlen > 4 && strcmp(devfn+devlen-4, ".raw") == 0) ||
+ strncmp(devfn, "/dev/", 5) == 0)
+ {
+ /* This part is now deprecated for PVE as well (just as qemu
+ * deprecated not specifying an explicit raw format, too.
+ */
+ /* explicit raw format */
+ options = qdict_new();
+ qdict_put_str(options, "driver", "raw");
+ }
+ if (cache && bdrv_parse_cache_mode(cache, &flags, &writethrough)) {
+ g_error("invalid cache option: %s\n", cache);
+ }
+
+ if (errp || !(blk = blk_new_open(devfn, NULL, options, flags, &errp))) {
+ g_error("can't open file %s - %s", devfn,
+ error_get_pretty(errp));
+ }
+
+ if (cache) {
+ blk_set_enable_write_cache(blk, !writethrough);
+ }
+
+ if (throttling_group) {
+ blk_io_limits_enable(blk, throttling_group);
+ }
+
+ if (throttling_bps) {
+ if (!throttling_group) {
+ blk_io_limits_enable(blk, devfn);
+ if (!skip) {
+ size_t devlen = strlen(devfn);
+ QDict *options = NULL;
+ bool writethrough;
+ if (format) {
+ /* explicit format from commandline */
+ options = qdict_new();
+ qdict_put_str(options, "driver", format);
+ } else if ((devlen > 4 && strcmp(devfn+devlen-4, ".raw") == 0) ||
+ strncmp(devfn, "/dev/", 5) == 0)
+ {
+ /* This part is now deprecated for PVE as well (just as qemu
+ * deprecated not specifying an explicit raw format, too.
+ */
+ /* explicit raw format */
+ options = qdict_new();
+ qdict_put_str(options, "driver", "raw");
+ }
+
+ ThrottleConfig cfg;
+ throttle_config_init(&cfg);
+ cfg.buckets[THROTTLE_BPS_WRITE].avg = throttling_bps;
+ Error *err = NULL;
+ if (!throttle_is_valid(&cfg, &err)) {
+ error_report_err(err);
+ g_error("failed to apply throttling");
+ if (cache && bdrv_parse_cache_mode(cache, &flags, &writethrough)) {
+ g_error("invalid cache option: %s\n", cache);
+ }
+
+ if (errp || !(blk = blk_new_open(devfn, NULL, options, flags, &errp))) {
+ g_error("can't open file %s - %s", devfn,
+ error_get_pretty(errp));
+ }
+
+ if (cache) {
+ blk_set_enable_write_cache(blk, !writethrough);
+ }
+
+ if (throttling_group) {
+ blk_io_limits_enable(blk, throttling_group);
+ }
+
+ if (throttling_bps) {
+ if (!throttling_group) {
+ blk_io_limits_enable(blk, devfn);
+ }
+
+ ThrottleConfig cfg;
+ throttle_config_init(&cfg);
+ cfg.buckets[THROTTLE_BPS_WRITE].avg = throttling_bps;
+ Error *err = NULL;
+ if (!throttle_is_valid(&cfg, &err)) {
+ error_report_err(err);
+ g_error("failed to apply throttling");
+ }
+ blk_set_io_limits(blk, &cfg);
+ }
+ blk_set_io_limits(blk, &cfg);
+ }
+
+ if (vma_reader_register_bs(vmar, i, blk, write_zero, &errp) < 0) {
+ if (vma_reader_register_bs(vmar, i, blk, write_zero, skip, &errp) < 0) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
@ -2146,6 +2229,10 @@ index 0000000000..e8dffb43e0
+ }
+ }
+
+ if (drive_list) {
+ g_strfreev(drive_list);
+ }
+
+ if (vma_reader_restore(vmar, vmstate_fd, verbose, &errp) < 0) {
+ g_error("restore failed - %s", error_get_pretty(errp));
+ }
@ -2153,7 +2240,7 @@ index 0000000000..e8dffb43e0
+ if (!readmap) {
+ for (i = 1; i < 255; i++) {
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, i);
+ if (di && (i != vmstate_stream)) {
+ if (di && drive_rename_bitmap[i]) {
+ char *tmpfn = g_strdup_printf("%s/tmp-disk-%s.raw",
+ dirname, di->devname);
+ char *fn = g_strdup_printf("%s/disk-%s.raw",
@ -2252,7 +2339,7 @@ index 0000000000..e8dffb43e0
+ struct iovec iov;
+ QEMUIOVector qiov;
+
+ int64_t start, end;
+ int64_t start, end, readlen;
+ int ret = 0;
+
+ unsigned char *buf = blk_blockalign(job->target, VMA_CLUSTER_SIZE);
@ -2266,16 +2353,24 @@ index 0000000000..e8dffb43e0
+ iov.iov_len = VMA_CLUSTER_SIZE;
+ qemu_iovec_init_external(&qiov, &iov, 1);
+
+ if (start + 1 == end) {
+ memset(buf, 0, VMA_CLUSTER_SIZE);
+ readlen = job->len - start * VMA_CLUSTER_SIZE;
+ assert(readlen > 0 && readlen <= VMA_CLUSTER_SIZE);
+ } else {
+ readlen = VMA_CLUSTER_SIZE;
+ }
+
+ ret = blk_co_preadv(job->target, start * VMA_CLUSTER_SIZE,
+ VMA_CLUSTER_SIZE, &qiov, 0);
+ readlen, &qiov, 0);
+ if (ret < 0) {
+ vma_writer_set_error(job->vmaw, "read error", -1);
+ vma_writer_set_error(job->vmaw, "read error");
+ goto out;
+ }
+
+ size_t zb = 0;
+ if (vma_writer_write(job->vmaw, job->dev_id, start, buf, &zb) < 0) {
+ vma_writer_set_error(job->vmaw, "backup_dump_cb vma_writer_write failed", -1);
+ vma_writer_set_error(job->vmaw, "backup_dump_cb vma_writer_write failed");
+ goto out;
+ }
+ }
@ -2293,7 +2388,7 @@ index 0000000000..e8dffb43e0
+
+static int create_archive(int argc, char **argv)
+{
+ int i, c;
+ int c;
+ int verbose = 0;
+ const char *archivename;
+ GList *backup_coroutines = NULL;
@ -2451,6 +2546,7 @@ index 0000000000..e8dffb43e0
+ vma_writer_get_status(vmaw, &vmastat);
+
+ if (verbose) {
+ int i;
+ for (i = 0; i < 256; i++) {
+ VmaStreamInfo *si = &vmastat.stream_info[i];
+ if (si->size) {
@ -2574,7 +2670,7 @@ index 0000000000..e8dffb43e0
+}
diff --git a/vma.h b/vma.h
new file mode 100644
index 0000000000..c895c97f6d
index 0000000000..86d2873aa5
--- /dev/null
+++ b/vma.h
@@ -0,0 +1,150 @@
@ -2712,7 +2808,7 @@ index 0000000000..c895c97f6d
+int coroutine_fn vma_writer_flush_output(VmaWriter *vmaw);
+
+int vma_writer_get_status(VmaWriter *vmaw, VmaStatus *status);
+void vma_writer_set_error(VmaWriter *vmaw, const char *fmt, ...);
+void vma_writer_set_error(VmaWriter *vmaw, const char *fmt, ...) G_GNUC_PRINTF(2, 3);
+
+
+VmaReader *vma_reader_create(const char *filename, Error **errp);
@ -2722,7 +2818,7 @@ index 0000000000..c895c97f6d
+VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id);
+int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id,
+ BlockBackend *target, bool write_zeroes,
+ Error **errp);
+ bool skip, Error **errp);
+int vma_reader_restore(VmaReader *vmar, int vmstate_fd, bool verbose,
+ Error **errp);
+int vma_reader_verify(VmaReader *vmar, bool verbose, Error **errp);

View File

@ -9,21 +9,23 @@ Subject: [PATCH] PVE-Backup: add backup-dump block driver
- job.c: make job_should_pause non-static
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to coroutine changes]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/backup-dump.c | 167 +++++++++++++++++++++++++++++++
block/backup-dump.c | 172 +++++++++++++++++++++++++++++++
block/backup.c | 30 ++----
block/meson.build | 1 +
include/block/block_int-common.h | 35 +++++++
job.c | 3 +-
5 files changed, 213 insertions(+), 23 deletions(-)
5 files changed, 218 insertions(+), 23 deletions(-)
create mode 100644 block/backup-dump.c
diff --git a/block/backup-dump.c b/block/backup-dump.c
new file mode 100644
index 0000000000..04718a94e2
index 0000000000..e46abf1070
--- /dev/null
+++ b/block/backup-dump.c
@@ -0,0 +1,167 @@
@@ -0,0 +1,172 @@
+/*
+ * BlockDriver to send backup data stream to a callback function
+ *
@ -35,6 +37,8 @@ index 0000000000..04718a94e2
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/qmp/qdict.h"
+#include "qom/object_interfaces.h"
+#include "block/block_int.h"
+
@ -45,7 +49,8 @@ index 0000000000..04718a94e2
+ void *dump_cb_data;
+} BDRVBackupDumpState;
+
+static int qemu_backup_dump_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+static coroutine_fn int qemu_backup_dump_co_get_info(BlockDriverState *bs,
+ BlockDriverInfo *bdi)
+{
+ BDRVBackupDumpState *s = bs->opaque;
+
@ -86,7 +91,7 @@ index 0000000000..04718a94e2
+ /* Nothing to do. */
+}
+
+static int64_t qemu_backup_dump_getlength(BlockDriverState *bs)
+static coroutine_fn int64_t qemu_backup_dump_co_getlength(BlockDriverState *bs)
+{
+ BDRVBackupDumpState *s = bs->opaque;
+
@ -146,8 +151,8 @@ index 0000000000..04718a94e2
+
+ .bdrv_close = qemu_backup_dump_close,
+ .bdrv_has_zero_init = bdrv_has_zero_init_1,
+ .bdrv_getlength = qemu_backup_dump_getlength,
+ .bdrv_get_info = qemu_backup_dump_get_info,
+ .bdrv_co_getlength = qemu_backup_dump_co_getlength,
+ .bdrv_co_get_info = qemu_backup_dump_co_get_info,
+
+ .bdrv_co_writev = qemu_backup_dump_co_writev,
+
@ -166,7 +171,7 @@ index 0000000000..04718a94e2
+block_init(bdrv_backup_dump_init);
+
+
+BlockDriverState *bdrv_backup_dump_create(
+BlockDriverState *coroutine_fn bdrv_co_backup_dump_create(
+ int dump_cb_block_size,
+ uint64_t byte_size,
+ BackupDumpFunc *dump_cb,
@ -174,9 +179,11 @@ index 0000000000..04718a94e2
+ Error **errp)
+{
+ BDRVBackupDumpState *state;
+ BlockDriverState *bs = bdrv_new_open_driver(
+ &bdrv_backup_dump_drive, NULL, BDRV_O_RDWR, errp);
+
+ QDict *options = qdict_new();
+ qdict_put_str(options, "driver", "backup-dump-drive");
+
+ BlockDriverState *bs = bdrv_co_open(NULL, NULL, options, BDRV_O_RDWR, errp);
+ if (!bs) {
+ return NULL;
+ }
@ -192,7 +199,7 @@ index 0000000000..04718a94e2
+ return bs;
+}
diff --git a/block/backup.c b/block/backup.c
index 9b0151c5be..6e8f6e67b3 100644
index 270957c0cd..16d611c4ca 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -29,28 +29,6 @@
@ -224,7 +231,7 @@ index 9b0151c5be..6e8f6e67b3 100644
static const BlockJobDriver backup_job_driver;
static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
@@ -454,6 +432,14 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
@@ -461,6 +439,14 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
}
cluster_size = block_copy_cluster_size(bcs);
@ -240,7 +247,7 @@ index 9b0151c5be..6e8f6e67b3 100644
if (perf->max_chunk && perf->max_chunk < cluster_size) {
error_setg(errp, "Required max-chunk (%" PRIi64 ") is less than backup "
diff --git a/block/meson.build b/block/meson.build
index 4feae20e37..0d7023fc82 100644
index b245daa98e..e99914eaa4 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -4,6 +4,7 @@ block_ss.add(files(
@ -248,28 +255,28 @@ index 4feae20e37..0d7023fc82 100644
'amend.c',
'backup.c',
+ 'backup-dump.c',
'copy-before-write.c',
'blkdebug.c',
'blklogwrites.c',
'blkverify.c',
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 31ae91e56e..37b64bcd93 100644
index 761276127e..b3e6697613 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -26,6 +26,7 @@
#include "block/accounting.h"
#include "block/block.h"
#include "block/aio.h"
#include "block/block-common.h"
+#include "block/block-copy.h"
#include "block/aio-wait.h"
#include "qemu/queue.h"
#include "qemu/coroutine.h"
@@ -64,6 +65,40 @@
#include "block/block-global-state.h"
#include "block/snapshot.h"
#include "qemu/iov.h"
@@ -60,6 +61,40 @@
#define BLOCK_PROBE_BUF_SIZE 512
+typedef int BackupDumpFunc(void *opaque, uint64_t offset, uint64_t bytes, const void *buf);
+
+BlockDriverState *bdrv_backup_dump_create(
+BlockDriverState *coroutine_fn bdrv_co_backup_dump_create(
+ int dump_cb_block_size,
+ uint64_t byte_size,
+ BackupDumpFunc *dump_cb,
@ -305,10 +312,10 @@ index 31ae91e56e..37b64bcd93 100644
BDRV_TRACKED_READ,
BDRV_TRACKED_WRITE,
diff --git a/job.c b/job.c
index 72d57f0934..93e22d180b 100644
index 660ce22c56..baf54c8d60 100644
--- a/job.c
+++ b/job.c
@@ -330,7 +330,8 @@ static bool job_started_locked(Job *job)
@@ -331,7 +331,8 @@ static bool job_started_locked(Job *job)
}
/* Called with job_mutex held. */

View File

@ -11,10 +11,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2 files changed, 46 insertions(+)
diff --git a/include/qemu/job.h b/include/qemu/job.h
index e502787dd8..963cf2bef5 100644
index 2b873f2576..528cd6acb9 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -381,6 +381,18 @@ void job_unlock(void);
@@ -362,6 +362,18 @@ void job_unlock(void);
*/
JobTxn *job_txn_new(void);
@ -34,10 +34,10 @@ index e502787dd8..963cf2bef5 100644
* Release a reference that was previously acquired with job_txn_add_job or
* job_txn_new. If it's the last reference to the object, it will be freed.
diff --git a/job.c b/job.c
index 93e22d180b..2b31f1e14f 100644
index baf54c8d60..3ac5e5cde2 100644
--- a/job.c
+++ b/job.c
@@ -93,6 +93,8 @@ struct JobTxn {
@@ -94,6 +94,8 @@ struct JobTxn {
/* Reference count */
int refcnt;
@ -46,7 +46,7 @@ index 93e22d180b..2b31f1e14f 100644
};
void job_lock(void)
@@ -118,6 +120,25 @@ JobTxn *job_txn_new(void)
@@ -119,6 +121,25 @@ JobTxn *job_txn_new(void)
return txn;
}
@ -72,7 +72,7 @@ index 93e22d180b..2b31f1e14f 100644
/* Called with job_mutex held. */
static void job_txn_ref_locked(JobTxn *txn)
{
@@ -1057,6 +1078,12 @@ static void job_completed_txn_success_locked(Job *job)
@@ -1042,6 +1063,12 @@ static void job_completed_txn_success_locked(Job *job)
*/
QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
if (!job_is_completed_locked(other_job)) {
@ -85,7 +85,7 @@ index 93e22d180b..2b31f1e14f 100644
return;
}
assert(other_job->ret == 0);
@@ -1268,6 +1295,13 @@ int job_finish_sync_locked(Job *job,
@@ -1253,6 +1280,13 @@ int job_finish_sync_locked(Job *job,
return -EBUSY;
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,452 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 29 Jun 2020 11:06:03 +0200
Subject: [PATCH] PVE-Backup: Add dirty-bitmap tracking for incremental backups
Uses QEMU's existing MIRROR_SYNC_MODE_BITMAP and a dirty-bitmap on top
of all backed-up drives. This will only execute the data-write callback
for any changed chunks, the PBS rust code will reuse chunks from the
previous index for everything it doesn't receive if reuse_index is true.
On error or cancellation, remove all dirty bitmaps to ensure
consistency.
Add PBS/incremental specific information to query backup info QMP and
HMP commands.
Only supported for PBS backups.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
monitor/hmp-cmds.c | 45 ++++++++++----
proxmox-backup-client.c | 3 +-
proxmox-backup-client.h | 1 +
pve-backup.c | 103 ++++++++++++++++++++++++++++++---
qapi/block-core.json | 12 +++-
6 files changed, 142 insertions(+), 23 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 477044c54a..556af25861 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1042,6 +1042,7 @@ void hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS fingerprint
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
+ false, false, // PBS incremental
true, dir ? BACKUP_FORMAT_DIR : BACKUP_FORMAT_VMA,
false, NULL, false, NULL, !!devlist,
devlist, qdict_haskey(qdict, "speed"), speed, &error);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index a40b25e906..670f783515 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -225,19 +225,42 @@ void hmp_info_backup(Monitor *mon, const QDict *qdict)
monitor_printf(mon, "End time: %s", ctime(&info->end_time));
}
- int per = (info->has_total && info->total &&
- info->has_transferred && info->transferred) ?
- (info->transferred * 100)/info->total : 0;
- int zero_per = (info->has_total && info->total &&
- info->has_zero_bytes && info->zero_bytes) ?
- (info->zero_bytes * 100)/info->total : 0;
monitor_printf(mon, "Backup file: %s\n", info->backup_file);
monitor_printf(mon, "Backup uuid: %s\n", info->uuid);
- monitor_printf(mon, "Total size: %zd\n", info->total);
- monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
- info->transferred, per);
- monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
- info->zero_bytes, zero_per);
+
+ if (!(info->has_total && info->total)) {
+ // this should not happen normally
+ monitor_printf(mon, "Total size: %d\n", 0);
+ } else {
+ bool incremental = false;
+ size_t total_or_dirty = info->total;
+ if (info->has_transferred) {
+ if (info->has_dirty && info->dirty) {
+ if (info->dirty < info->total) {
+ total_or_dirty = info->dirty;
+ incremental = true;
+ }
+ }
+ }
+
+ int per = (info->transferred * 100)/total_or_dirty;
+
+ monitor_printf(mon, "Backup mode: %s\n", incremental ? "incremental" : "full");
+
+ int zero_per = (info->has_zero_bytes && info->zero_bytes) ?
+ (info->zero_bytes * 100)/info->total : 0;
+ monitor_printf(mon, "Total size: %zd\n", info->total);
+ monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
+ info->transferred, per);
+ monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
+ info->zero_bytes, zero_per);
+
+ if (info->has_reused) {
+ int reused_per = (info->reused * 100)/total_or_dirty;
+ monitor_printf(mon, "Reused bytes: %zd (%d%%)\n",
+ info->reused, reused_per);
+ }
+ }
}
qapi_free_BackupStatus(info);
diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
index a8f6653a81..4ce7bc0b5e 100644
--- a/proxmox-backup-client.c
+++ b/proxmox-backup-client.c
@@ -89,6 +89,7 @@ proxmox_backup_co_register_image(
ProxmoxBackupHandle *pbs,
const char *device_name,
uint64_t size,
+ bool incremental,
Error **errp)
{
Coroutine *co = qemu_coroutine_self();
@@ -98,7 +99,7 @@ proxmox_backup_co_register_image(
int pbs_res = -1;
proxmox_backup_register_image_async(
- pbs, device_name, size ,proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
+ pbs, device_name, size, incremental, proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
qemu_coroutine_yield();
if (pbs_res < 0) {
if (errp) error_setg(errp, "backup register image failed: %s", pbs_err ? pbs_err : "unknown error");
diff --git a/proxmox-backup-client.h b/proxmox-backup-client.h
index 1dda8b7d8f..8cbf645b2c 100644
--- a/proxmox-backup-client.h
+++ b/proxmox-backup-client.h
@@ -32,6 +32,7 @@ proxmox_backup_co_register_image(
ProxmoxBackupHandle *pbs,
const char *device_name,
uint64_t size,
+ bool incremental,
Error **errp);
diff --git a/pve-backup.c b/pve-backup.c
index 3d28975eaa..abd7062afe 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -28,6 +28,8 @@
*
*/
+const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
+
static struct PVEBackupState {
struct {
// Everithing accessed from qmp_backup_query command is protected using lock
@@ -39,7 +41,9 @@ static struct PVEBackupState {
uuid_t uuid;
char uuid_str[37];
size_t total;
+ size_t dirty;
size_t transferred;
+ size_t reused;
size_t zero_bytes;
} stat;
int64_t speed;
@@ -66,6 +70,7 @@ typedef struct PVEBackupDevInfo {
uint8_t dev_id;
bool completed;
char targetfile[PATH_MAX];
+ BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
} PVEBackupDevInfo;
@@ -107,11 +112,12 @@ static bool pvebackup_error_or_canceled(void)
return error_or_canceled;
}
-static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes)
+static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes, size_t reused)
{
qemu_mutex_lock(&backup_state.stat.lock);
backup_state.stat.zero_bytes += zero_bytes;
backup_state.stat.transferred += transferred;
+ backup_state.stat.reused += reused;
qemu_mutex_unlock(&backup_state.stat.lock);
}
@@ -150,7 +156,8 @@ pvebackup_co_dump_pbs_cb(
pvebackup_propagate_error(local_err);
return pbs_res;
} else {
- pvebackup_add_transfered_bytes(size, !buf ? size : 0);
+ size_t reused = (pbs_res == 0) ? size : 0;
+ pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
}
return size;
@@ -210,11 +217,11 @@ pvebackup_co_dump_vma_cb(
} else {
if (remaining >= VMA_CLUSTER_SIZE) {
assert(ret == VMA_CLUSTER_SIZE);
- pvebackup_add_transfered_bytes(VMA_CLUSTER_SIZE, zero_bytes);
+ pvebackup_add_transfered_bytes(VMA_CLUSTER_SIZE, zero_bytes, 0);
remaining -= VMA_CLUSTER_SIZE;
} else {
assert(ret == remaining);
- pvebackup_add_transfered_bytes(remaining, zero_bytes);
+ pvebackup_add_transfered_bytes(remaining, zero_bytes, 0);
remaining = 0;
}
}
@@ -250,6 +257,18 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
if (local_err != NULL) {
pvebackup_propagate_error(local_err);
}
+ } else {
+ // on error or cancel we cannot ensure synchronization of dirty
+ // bitmaps with backup server, so remove all and do full backup next
+ GList *l = backup_state.di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ if (di->bitmap) {
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+ }
}
proxmox_backup_disconnect(backup_state.pbs);
@@ -305,6 +324,12 @@ static void pvebackup_complete_cb(void *opaque, int ret)
// remove self from job queue
backup_state.di_list = g_list_remove(backup_state.di_list, di);
+ if (di->bitmap && ret < 0) {
+ // on error or cancel we cannot ensure synchronization of dirty
+ // bitmaps with backup server, so remove all and do full backup next
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+
g_free(di);
qemu_mutex_unlock(&backup_state.backup_mutex);
@@ -469,12 +494,18 @@ static bool create_backup_jobs(void) {
assert(di->target != NULL);
+ MirrorSyncMode sync_mode = MIRROR_SYNC_MODE_FULL;
+ BitmapSyncMode bitmap_mode = BITMAP_SYNC_MODE_NEVER;
+ if (di->bitmap) {
+ sync_mode = MIRROR_SYNC_MODE_BITMAP;
+ bitmap_mode = BITMAP_SYNC_MODE_ON_SUCCESS;
+ }
AioContext *aio_context = bdrv_get_aio_context(di->bs);
aio_context_acquire(aio_context);
BlockJob *job = backup_job_create(
- NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
- BITMAP_SYNC_MODE_NEVER, false, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+ NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
+ bitmap_mode, false, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
JOB_DEFAULT, pvebackup_complete_cb, di, NULL, &local_err);
aio_context_release(aio_context);
@@ -525,6 +556,8 @@ typedef struct QmpBackupTask {
const char *fingerprint;
bool has_fingerprint;
int64_t backup_time;
+ bool has_use_dirty_bitmap;
+ bool use_dirty_bitmap;
bool has_format;
BackupFormat format;
bool has_config_file;
@@ -616,6 +649,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
size_t total = 0;
+ size_t dirty = 0;
l = di_list;
while (l) {
@@ -653,6 +687,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
firewall_name = "fw.conf";
+ bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
+
char *pbs_err = NULL;
pbs = proxmox_backup_new(
task->backup_file,
@@ -672,7 +708,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
goto err;
}
- if (proxmox_backup_co_connect(pbs, task->errp) < 0)
+ int connect_result = proxmox_backup_co_connect(pbs, task->errp);
+ if (connect_result < 0)
goto err;
/* register all devices */
@@ -683,9 +720,40 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *devname = bdrv_get_device_name(di->bs);
- int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, task->errp);
- if (dev_id < 0)
+ BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
+ bool expect_only_dirty = false;
+
+ if (use_dirty_bitmap) {
+ if (bitmap == NULL) {
+ bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
+ if (!bitmap) {
+ goto err;
+ }
+ } else {
+ expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
+ }
+
+ if (expect_only_dirty) {
+ dirty += bdrv_get_dirty_count(bitmap);
+ } else {
+ /* mark entire bitmap as dirty to make full backup */
+ bdrv_set_dirty_bitmap(bitmap, 0, di->size);
+ dirty += di->size;
+ }
+ di->bitmap = bitmap;
+ } else {
+ dirty += di->size;
+
+ /* after a full backup the old dirty bitmap is invalid anyway */
+ if (bitmap != NULL) {
+ bdrv_release_dirty_bitmap(bitmap);
+ }
+ }
+
+ int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
+ if (dev_id < 0) {
goto err;
+ }
if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
goto err;
@@ -694,6 +762,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
di->dev_id = dev_id;
}
} else if (format == BACKUP_FORMAT_VMA) {
+ dirty = total;
+
vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
if (!vmaw) {
if (local_err) {
@@ -721,6 +791,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
}
} else if (format == BACKUP_FORMAT_DIR) {
+ dirty = total;
+
if (mkdir(task->backup_file, 0640) != 0) {
error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
task->backup_file);
@@ -793,8 +865,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
char *uuid_str = g_strdup(backup_state.stat.uuid_str);
backup_state.stat.total = total;
+ backup_state.stat.dirty = dirty;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
+ backup_state.stat.reused = format == BACKUP_FORMAT_PBS && dirty >= total ? 0 : total - dirty;
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -818,6 +892,10 @@ err:
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
+ if (di->bitmap) {
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+
if (di->target) {
bdrv_unref(di->target);
}
@@ -859,6 +937,7 @@ UuidInfo *qmp_backup(
bool has_fingerprint, const char *fingerprint,
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
+ bool has_use_dirty_bitmap, bool use_dirty_bitmap,
bool has_format, BackupFormat format,
bool has_config_file, const char *config_file,
bool has_firewall_file, const char *firewall_file,
@@ -877,6 +956,8 @@ UuidInfo *qmp_backup(
.backup_id = backup_id,
.has_backup_time = has_backup_time,
.backup_time = backup_time,
+ .has_use_dirty_bitmap = has_use_dirty_bitmap,
+ .use_dirty_bitmap = use_dirty_bitmap,
.has_format = has_format,
.format = format,
.has_config_file = has_config_file,
@@ -945,10 +1026,14 @@ BackupStatus *qmp_query_backup(Error **errp)
info->has_total = true;
info->total = backup_state.stat.total;
+ info->has_dirty = true;
+ info->dirty = backup_state.stat.dirty;
info->has_zero_bytes = true;
info->zero_bytes = backup_state.stat.zero_bytes;
info->has_transferred = true;
info->transferred = backup_state.stat.transferred;
+ info->has_reused = true;
+ info->reused = backup_state.stat.reused;
qemu_mutex_unlock(&backup_state.stat.lock);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c3b6b93472..992e6c1e3f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -753,8 +753,13 @@
#
# @total: total amount of bytes involved in the backup process
#
+# @dirty: with incremental mode (PBS) this is the amount of bytes involved
+# in the backup process which are marked dirty.
+#
# @transferred: amount of bytes already backed up.
#
+# @reused: amount of bytes reused due to deduplication.
+#
# @zero-bytes: amount of 'zero' bytes detected.
#
# @start-time: time (epoch) when backup job started.
@@ -767,8 +772,8 @@
#
##
{ 'struct': 'BackupStatus',
- 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int',
- '*transferred': 'int', '*zero-bytes': 'int',
+ 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int', '*dirty': 'int',
+ '*transferred': 'int', '*zero-bytes': 'int', '*reused': 'int',
'*start-time': 'int', '*end-time': 'int',
'*backup-file': 'str', '*uuid': 'str' } }
@@ -811,6 +816,8 @@
#
# @backup-time: backup timestamp (Unix epoch, required for format 'pbs')
#
+# @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
+#
# Returns: the uuid of the backup job
#
##
@@ -821,6 +828,7 @@
'*fingerprint': 'str',
'*backup-id': 'str',
'*backup-time': 'int',
+ '*use-dirty-bitmap': 'bool',
'*format': 'BackupFormat',
'*config-file': 'str',
'*firewall-file': 'str',

View File

@ -5,17 +5,19 @@ Subject: [PATCH] PVE-Backup: pbs-restore - new command to restore from proxmox
backup server
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[WB: add namespace support]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
meson.build | 4 +
pbs-restore.c | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 227 insertions(+)
pbs-restore.c | 236 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 240 insertions(+)
create mode 100644 pbs-restore.c
diff --git a/meson.build b/meson.build
index 782756162c..63ea813a9a 100644
index d16b97cf3c..6de51c34cb 100644
--- a/meson.build
+++ b/meson.build
@@ -3602,6 +3602,10 @@ if have_tools
@@ -4029,6 +4029,10 @@ if have_tools
vma = executable('vma', files('vma.c', 'vma-reader.c') + genh,
dependencies: [authz, block, crypto, io, qom], install: true)
@ -24,14 +26,14 @@ index 782756162c..63ea813a9a 100644
+ libproxmox_backup_qemu], install: true)
+
subdir('storage-daemon')
subdir('contrib/rdmacm-mux')
subdir('contrib/elf2dmp')
foreach exe: [ 'qemu-img', 'qemu-io', 'qemu-nbd', 'qemu-storage-daemon']
diff --git a/pbs-restore.c b/pbs-restore.c
new file mode 100644
index 0000000000..2f834cf42e
index 0000000000..f03d9bab8d
--- /dev/null
+++ b/pbs-restore.c
@@ -0,0 +1,223 @@
@@ -0,0 +1,236 @@
+/*
+ * Qemu image restore helper for Proxmox Backup
+ *
@ -63,7 +65,7 @@ index 0000000000..2f834cf42e
+static void help(void)
+{
+ const char *help_msg =
+ "usage: pbs-restore [--repository <repo>] snapshot archive-name target [command options]\n"
+ "usage: pbs-restore [--repository <repo>] [--ns namespace] snapshot archive-name target [command options]\n"
+ ;
+
+ printf("%s", help_msg);
@ -111,6 +113,7 @@ index 0000000000..2f834cf42e
+ Error *main_loop_err = NULL;
+ const char *format = "raw";
+ const char *repository = NULL;
+ const char *backup_ns = NULL;
+ const char *keyfile = NULL;
+ int verbose = false;
+ bool skip_zero = false;
@ -124,6 +127,7 @@ index 0000000000..2f834cf42e
+ {"verbose", no_argument, 0, 'v'},
+ {"format", required_argument, 0, 'f'},
+ {"repository", required_argument, 0, 'r'},
+ {"ns", required_argument, 0, 'n'},
+ {"keyfile", required_argument, 0, 'k'},
+ {0, 0, 0, 0}
+ };
@ -144,6 +148,9 @@ index 0000000000..2f834cf42e
+ case 'r':
+ repository = g_strdup(argv[optind - 1]);
+ break;
+ case 'n':
+ backup_ns = g_strdup(argv[optind - 1]);
+ break;
+ case 'k':
+ keyfile = g_strdup(argv[optind - 1]);
+ break;
@ -194,8 +201,16 @@ index 0000000000..2f834cf42e
+ fprintf(stderr, "connecting to repository '%s'\n", repository);
+ }
+ char *pbs_error = NULL;
+ ProxmoxRestoreHandle *conn = proxmox_restore_new(
+ repository, snapshot, password, keyfile, key_password, fingerprint, &pbs_error);
+ ProxmoxRestoreHandle *conn = proxmox_restore_new_ns(
+ repository,
+ snapshot,
+ backup_ns,
+ password,
+ keyfile,
+ key_password,
+ fingerprint,
+ &pbs_error
+ );
+ if (conn == NULL) {
+ fprintf(stderr, "restore failed: %s\n", pbs_error);
+ return -1;

View File

@ -7,39 +7,40 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[error cleanups, file_open implementation]
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[WB: add namespace support]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[FE: adapt to changed function signatures
make pbs_co_preadv return values consistent with QEMU]
make pbs_co_preadv return values consistent with QEMU
getlength is now a coroutine function]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/meson.build | 3 +
block/pbs.c | 276 +++++++++++++++++++++++++++++++++++++++++++
configure | 9 ++
block/meson.build | 2 +
block/pbs.c | 307 +++++++++++++++++++++++++++++++++++++++++++
meson.build | 2 +-
qapi/block-core.json | 13 ++
qapi/block-core.json | 29 ++++
qapi/pragma.json | 1 +
6 files changed, 303 insertions(+), 1 deletion(-)
5 files changed, 340 insertions(+), 1 deletion(-)
create mode 100644 block/pbs.c
diff --git a/block/meson.build b/block/meson.build
index e995ae72b9..7ef2fa72d5 100644
index 6bba803f94..1945e04eeb 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -53,6 +53,9 @@ block_ss.add(files(
@@ -49,6 +49,8 @@ block_ss.add(files(
'../pve-backup.c',
), libproxmox_backup_qemu)
+block_ss.add(when: 'CONFIG_PBS_BDRV', if_true: files('pbs.c'))
+block_ss.add(when: 'CONFIG_PBS_BDRV', if_true: libproxmox_backup_qemu)
+block_ss.add(files('pbs.c'), libproxmox_backup_qemu)
+
softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
softmmu_ss.add(files('block-ram-registrar.c'))
system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(files('block-ram-registrar.c'))
diff --git a/block/pbs.c b/block/pbs.c
new file mode 100644
index 0000000000..9d1f1f39d4
index 0000000000..dd72356bd3
--- /dev/null
+++ b/block/pbs.c
@@ -0,0 +1,276 @@
@@ -0,0 +1,307 @@
+/*
+ * Proxmox Backup Server read-only block driver
+ */
@ -52,10 +53,12 @@ index 0000000000..9d1f1f39d4
+#include "qemu/option.h"
+#include "qemu/cutils.h"
+#include "block/block_int.h"
+#include "block/block-io.h"
+
+#include <proxmox-backup-qemu.h>
+
+#define PBS_OPT_REPOSITORY "repository"
+#define PBS_OPT_NAMESPACE "namespace"
+#define PBS_OPT_SNAPSHOT "snapshot"
+#define PBS_OPT_ARCHIVE "archive"
+#define PBS_OPT_KEYFILE "keyfile"
@ -69,6 +72,7 @@ index 0000000000..9d1f1f39d4
+ int64_t length;
+
+ char *repository;
+ char *namespace;
+ char *snapshot;
+ char *archive;
+} BDRVPBSState;
@ -83,6 +87,11 @@ index 0000000000..9d1f1f39d4
+ .help = "The server address and repository to connect to.",
+ },
+ {
+ .name = PBS_OPT_NAMESPACE,
+ .type = QEMU_OPT_STRING,
+ .help = "Optional: The snapshot's namespace.",
+ },
+ {
+ .name = PBS_OPT_SNAPSHOT,
+ .type = QEMU_OPT_STRING,
+ .help = "The snapshot to read.",
@ -118,7 +127,7 @@ index 0000000000..9d1f1f39d4
+
+
+// filename format:
+// pbs:repository=<repo>,snapshot=<snap>,password=<pw>,key_password=<kpw>,fingerprint=<fp>,archive=<archive>
+// pbs:repository=<repo>,namespace=<ns>,snapshot=<snap>,password=<pw>,key_password=<kpw>,fingerprint=<fp>,archive=<archive>
+static void pbs_parse_filename(const char *filename, QDict *options,
+ Error **errp)
+{
@ -154,6 +163,7 @@ index 0000000000..9d1f1f39d4
+ s->archive = g_strdup(qemu_opt_get(opts, PBS_OPT_ARCHIVE));
+ const char *keyfile = qemu_opt_get(opts, PBS_OPT_KEYFILE);
+ const char *password = qemu_opt_get(opts, PBS_OPT_PASSWORD);
+ const char *namespace = qemu_opt_get(opts, PBS_OPT_NAMESPACE);
+ const char *fingerprint = qemu_opt_get(opts, PBS_OPT_FINGERPRINT);
+ const char *key_password = qemu_opt_get(opts, PBS_OPT_ENCRYPTION_PASSWORD);
+
@ -166,9 +176,12 @@ index 0000000000..9d1f1f39d4
+ if (!key_password) {
+ key_password = getenv("PBS_ENCRYPTION_PASSWORD");
+ }
+ if (namespace) {
+ s->namespace = g_strdup(namespace);
+ }
+
+ /* connect to PBS server in read mode */
+ s->conn = proxmox_restore_new(s->repository, s->snapshot, password,
+ s->conn = proxmox_restore_new_ns(s->repository, s->snapshot, s->namespace, password,
+ keyfile, key_password, fingerprint, &pbs_error);
+
+ /* invalidates qemu_opt_get char pointers from above */
@ -213,12 +226,14 @@ index 0000000000..9d1f1f39d4
+static void pbs_close(BlockDriverState *bs) {
+ BDRVPBSState *s = bs->opaque;
+ g_free(s->repository);
+ g_free(s->namespace);
+ g_free(s->snapshot);
+ g_free(s->archive);
+ proxmox_restore_disconnect(s->conn);
+}
+
+static int64_t pbs_getlength(BlockDriverState *bs)
+static coroutine_fn int64_t GRAPH_RDLOCK
+pbs_co_getlength(BlockDriverState *bs)
+{
+ BDRVPBSState *s = bs->opaque;
+ return s->length;
@ -235,14 +250,23 @@ index 0000000000..9d1f1f39d4
+ aio_co_schedule(rcb->ctx, rcb->co);
+}
+
+static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
+ int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+static coroutine_fn int GRAPH_RDLOCK
+pbs_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ BDRVPBSState *s = bs->opaque;
+ int ret;
+ char *pbs_error = NULL;
+ uint8_t *buf = malloc(bytes);
+ uint8_t *buf;
+ bool inline_buf = true;
+
+ /* for single-buffer IO vectors we can fast-path the write directly to it */
+ if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
+ buf = qiov->iov->iov_base;
+ } else {
+ inline_buf = false;
+ buf = g_malloc(bytes);
+ }
+
+ if (offset < 0 || bytes < 0) {
+ fprintf(stderr, "unexpected negative 'offset' or 'bytes' value!\n");
@ -265,26 +289,34 @@ index 0000000000..9d1f1f39d4
+ return -EIO;
+ }
+
+ qemu_iovec_from_buf(qiov, 0, buf, bytes);
+ free(buf);
+ if (!inline_buf) {
+ qemu_iovec_from_buf(qiov, 0, buf, bytes);
+ g_free(buf);
+ }
+
+ return 0;
+}
+
+static coroutine_fn int pbs_co_pwritev(BlockDriverState *bs,
+ int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+static coroutine_fn int GRAPH_RDLOCK
+pbs_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ fprintf(stderr, "pbs-bdrv: cannot write to backup file, make sure "
+ "any attached disk devices are set to read-only!\n");
+ return -EPERM;
+}
+
+static void pbs_refresh_filename(BlockDriverState *bs)
+static void GRAPH_RDLOCK
+pbs_refresh_filename(BlockDriverState *bs)
+{
+ BDRVPBSState *s = bs->opaque;
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s(%s)",
+ s->repository, s->snapshot, s->archive);
+ if (s->namespace) {
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s:%s(%s)",
+ s->repository, s->namespace, s->snapshot, s->archive);
+ } else {
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s(%s)",
+ s->repository, s->snapshot, s->archive);
+ }
+}
+
+static const char *const pbs_strong_runtime_opts[] = {
@ -301,7 +333,7 @@ index 0000000000..9d1f1f39d4
+ .bdrv_file_open = pbs_file_open,
+ .bdrv_open = pbs_open,
+ .bdrv_close = pbs_close,
+ .bdrv_getlength = pbs_getlength,
+ .bdrv_co_getlength = pbs_co_getlength,
+
+ .bdrv_co_preadv = pbs_co_preadv,
+ .bdrv_co_pwritev = pbs_co_pwritev,
@ -316,52 +348,11 @@ index 0000000000..9d1f1f39d4
+}
+
+block_init(bdrv_pbs_init);
diff --git a/configure b/configure
index 26c7bc5154..c587e986c7 100755
--- a/configure
+++ b/configure
@@ -285,6 +285,7 @@ linux_user=""
bsd_user=""
pie=""
coroutine=""
+pbs_bdrv="yes"
plugins="$default_feature"
meson=""
ninja=""
@@ -864,6 +865,10 @@ for opt do
--enable-uuid|--disable-uuid)
echo "$0: $opt is obsolete, UUID support is always built" >&2
;;
+ --disable-pbs-bdrv) pbs_bdrv="no"
+ ;;
+ --enable-pbs-bdrv) pbs_bdrv="yes"
+ ;;
--with-git=*) git="$optarg"
;;
--with-git-submodules=*)
@@ -1049,6 +1054,7 @@ cat << EOF
debug-info debugging information
safe-stack SafeStack Stack Smash Protection. Depends on
clang/llvm >= 3.7 and requires coroutine backend ucontext.
+ pbs-bdrv Proxmox backup server read-only block driver support
NOTE: The object files are built at the place where configure is launched
EOF
@@ -2372,6 +2378,9 @@ echo "TARGET_DIRS=$target_list" >> $config_host_mak
if test "$modules" = "yes"; then
echo "CONFIG_MODULES=y" >> $config_host_mak
fi
+if test "$pbs_bdrv" = "yes" ; then
+ echo "CONFIG_PBS_BDRV=y" >> $config_host_mak
+fi
# XXX: suppress that
if [ "$bsd" = "yes" ] ; then
diff --git a/meson.build b/meson.build
index 63ea813a9a..f7f5b3f253 100644
index 6de51c34cb..3bc039f60f 100644
--- a/meson.build
+++ b/meson.build
@@ -3978,7 +3978,7 @@ summary_info += {'bzip2 support': libbzip2}
@@ -4477,7 +4477,7 @@ summary_info += {'bzip2 support': libbzip2}
summary_info += {'lzfse support': liblzfse}
summary_info += {'zstd support': zstd}
summary_info += {'NUMA host support': numa}
@ -371,10 +362,10 @@ index 63ea813a9a..f7f5b3f253 100644
summary_info += {'libdaxctl support': libdaxctl}
summary_info += {'libudev': libudev}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5ac6276dc1..45b63dfe26 100644
index e49c7b5bc9..fc32ff9957 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3103,6 +3103,7 @@
@@ -3457,6 +3457,7 @@
'parallels', 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum',
'raw', 'rbd',
{ 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
@ -382,7 +373,7 @@ index 5ac6276dc1..45b63dfe26 100644
'ssh', 'throttle', 'vdi', 'vhdx',
{ 'name': 'virtio-blk-vfio-pci', 'if': 'CONFIG_BLKIO' },
{ 'name': 'virtio-blk-vhost-user', 'if': 'CONFIG_BLKIO' },
@@ -3179,6 +3180,17 @@
@@ -3543,6 +3544,33 @@
{ 'struct': 'BlockdevOptionsNull',
'data': { '*size': 'int', '*latency-ns': 'uint64', '*read-zeroes': 'bool' } }
@ -391,16 +382,32 @@ index 5ac6276dc1..45b63dfe26 100644
+#
+# Driver specific block device options for the PBS backend.
+#
+# @repository: Proxmox Backup Server repository.
+#
+# @snapshot: backup snapshots ID.
+#
+# @archive: archive name.
+#
+# @keyfile: keyfile to use for encryption.
+#
+# @password: password to use for connection.
+#
+# @fingerprint: backup server fingerprint.
+#
+# @key_password: password to unlock key.
+#
+# @namespace: namespace where backup snapshot lives.
+#
+##
+{ 'struct': 'BlockdevOptionsPbs',
+ 'data': { 'repository': 'str', 'snapshot': 'str', 'archive': 'str',
+ '*keyfile': 'str', '*password': 'str', '*fingerprint': 'str',
+ '*key_password': 'str' } }
+ '*key_password': 'str', '*namespace': 'str' } }
+
##
# @BlockdevOptionsNVMe:
#
@@ -4531,6 +4543,7 @@
@@ -4977,6 +5005,7 @@
'nfs': 'BlockdevOptionsNfs',
'null-aio': 'BlockdevOptionsNull',
'null-co': 'BlockdevOptionsNull',
@ -409,10 +416,10 @@ index 5ac6276dc1..45b63dfe26 100644
'nvme-io_uring': { 'type': 'BlockdevOptionsNvmeIoUring',
'if': 'CONFIG_BLKIO' },
diff --git a/qapi/pragma.json b/qapi/pragma.json
index f2097b9020..5ab1890519 100644
index be8fa304c5..7ff46bd128 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -47,6 +47,7 @@
@@ -100,6 +100,7 @@
'BlockInfo', # query-block
'BlockdevAioOptions', # blockdev-add, -blockdev
'BlockdevDriver', # blockdev-add, query-blockstats, ...

View File

@ -1,219 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Thu, 9 Jul 2020 12:53:08 +0200
Subject: [PATCH] PVE: various PBS fixes
pbs: fix crypt and compress parameters
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
PVE: handle PBS write callback with big blocks correctly
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
PVE: add zero block handling to PBS dump callback
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 4 ++-
pve-backup.c | 57 +++++++++++++++++++++++++++-------
qapi/block-core.json | 6 ++++
3 files changed, 54 insertions(+), 13 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 556af25861..a09f722fea 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1042,7 +1042,9 @@ void hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS fingerprint
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
- false, false, // PBS incremental
+ false, false, // PBS use-dirty-bitmap
+ false, false, // PBS compress
+ false, false, // PBS encrypt
true, dir ? BACKUP_FORMAT_DIR : BACKUP_FORMAT_VMA,
false, NULL, false, NULL, !!devlist,
devlist, qdict_haskey(qdict, "speed"), speed, &error);
diff --git a/pve-backup.c b/pve-backup.c
index abd7062afe..e113ab61b9 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -8,6 +8,7 @@
#include "block/blockjob.h"
#include "qapi/qapi-commands-block.h"
#include "qapi/qmp/qerror.h"
+#include "qemu/cutils.h"
/* PVE backup state and related function */
@@ -67,6 +68,7 @@ opts_init(pvebackup_init);
typedef struct PVEBackupDevInfo {
BlockDriverState *bs;
size_t size;
+ uint64_t block_size;
uint8_t dev_id;
bool completed;
char targetfile[PATH_MAX];
@@ -137,10 +139,13 @@ pvebackup_co_dump_pbs_cb(
PVEBackupDevInfo *di = opaque;
assert(backup_state.pbs);
+ assert(buf);
Error *local_err = NULL;
int pbs_res = -1;
+ bool is_zero_block = size == di->block_size && buffer_is_zero(buf, size);
+
qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
// avoid deadlock if job is cancelled
@@ -149,17 +154,29 @@ pvebackup_co_dump_pbs_cb(
return -1;
}
- pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
- qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ uint64_t transferred = 0;
+ uint64_t reused = 0;
+ while (transferred < size) {
+ uint64_t left = size - transferred;
+ uint64_t to_transfer = left < di->block_size ? left : di->block_size;
- if (pbs_res < 0) {
- pvebackup_propagate_error(local_err);
- return pbs_res;
- } else {
- size_t reused = (pbs_res == 0) ? size : 0;
- pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
+ pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id,
+ is_zero_block ? NULL : buf + transferred, start + transferred,
+ to_transfer, &local_err);
+ transferred += to_transfer;
+
+ if (pbs_res < 0) {
+ pvebackup_propagate_error(local_err);
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ return pbs_res;
+ }
+
+ reused += pbs_res == 0 ? to_transfer : 0;
}
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ pvebackup_add_transfered_bytes(size, is_zero_block ? size : 0, reused);
+
return size;
}
@@ -180,6 +197,7 @@ pvebackup_co_dump_vma_cb(
int ret = -1;
assert(backup_state.vmaw);
+ assert(buf);
uint64_t remaining = size;
@@ -206,9 +224,7 @@ pvebackup_co_dump_vma_cb(
qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++cluster_num;
- if (buf) {
- buf += VMA_CLUSTER_SIZE;
- }
+ buf += VMA_CLUSTER_SIZE;
if (ret < 0) {
Error *local_err = NULL;
vma_writer_error_propagate(backup_state.vmaw, &local_err);
@@ -566,6 +582,10 @@ typedef struct QmpBackupTask {
const char *firewall_file;
bool has_devlist;
const char *devlist;
+ bool has_compress;
+ bool compress;
+ bool has_encrypt;
+ bool encrypt;
bool has_speed;
int64_t speed;
Error **errp;
@@ -689,6 +709,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
+
char *pbs_err = NULL;
pbs = proxmox_backup_new(
task->backup_file,
@@ -698,8 +719,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
task->has_password ? task->password : NULL,
task->has_keyfile ? task->keyfile : NULL,
task->has_key_password ? task->key_password : NULL,
+ task->has_compress ? task->compress : true,
+ task->has_encrypt ? task->encrypt : task->has_keyfile,
task->has_fingerprint ? task->fingerprint : NULL,
- &pbs_err);
+ &pbs_err);
if (!pbs) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
@@ -718,6 +741,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
+ di->block_size = dump_cb_block_size;
+
const char *devname = bdrv_get_device_name(di->bs);
BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
@@ -938,6 +963,8 @@ UuidInfo *qmp_backup(
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
bool has_use_dirty_bitmap, bool use_dirty_bitmap,
+ bool has_compress, bool compress,
+ bool has_encrypt, bool encrypt,
bool has_format, BackupFormat format,
bool has_config_file, const char *config_file,
bool has_firewall_file, const char *firewall_file,
@@ -948,6 +975,8 @@ UuidInfo *qmp_backup(
.backup_file = backup_file,
.has_password = has_password,
.password = password,
+ .has_keyfile = has_keyfile,
+ .keyfile = keyfile,
.has_key_password = has_key_password,
.key_password = key_password,
.has_fingerprint = has_fingerprint,
@@ -958,6 +987,10 @@ UuidInfo *qmp_backup(
.backup_time = backup_time,
.has_use_dirty_bitmap = has_use_dirty_bitmap,
.use_dirty_bitmap = use_dirty_bitmap,
+ .has_compress = has_compress,
+ .compress = compress,
+ .has_encrypt = has_encrypt,
+ .encrypt = encrypt,
.has_format = has_format,
.format = format,
.has_config_file = has_config_file,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 992e6c1e3f..5ac6276dc1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -818,6 +818,10 @@
#
# @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
#
+# @compress: use compression (optional for format 'pbs', defaults to true)
+#
+# @encrypt: use encryption ((optional for format 'pbs', defaults to true if there is a keyfile)
+#
# Returns: the uuid of the backup job
#
##
@@ -829,6 +833,8 @@
'*backup-id': 'str',
'*backup-time': 'int',
'*use-dirty-bitmap': 'bool',
+ '*compress': 'bool',
+ '*encrypt': 'bool',
'*format': 'BackupFormat',
'*config-file': 'str',
'*firewall-file': 'str',

View File

@ -9,15 +9,15 @@ fitting.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
meson.build | 2 ++
meson.build | 3 ++-
os-posix.c | 7 +++++--
2 files changed, 7 insertions(+), 2 deletions(-)
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/meson.build b/meson.build
index f7f5b3f253..283b0e356e 100644
index 3bc039f60f..067e8956a7 100644
--- a/meson.build
+++ b/meson.build
@@ -1526,6 +1526,7 @@ keyutils = dependency('libkeyutils', required: false,
@@ -1923,6 +1923,7 @@ endif
has_gettid = cc.has_function('gettid')
libuuid = cc.find_library('uuid', required: true)
@ -25,28 +25,29 @@ index f7f5b3f253..283b0e356e 100644
libproxmox_backup_qemu = cc.find_library('proxmox_backup_qemu', required: true)
# libselinux
@@ -3096,6 +3097,7 @@ if have_block
# os-posix.c contains POSIX-specific functions used by qemu-storage-daemon,
# os-win32.c does not
blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
+ blockdev_ss.add(when: 'CONFIG_POSIX', if_true: libsystemd)
softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
@@ -3530,7 +3531,7 @@ if have_block
if host_os == 'windows'
system_ss.add(files('os-win32.c'))
else
- blockdev_ss.add(files('os-posix.c'))
+ blockdev_ss.add(files('os-posix.c'), libsystemd)
endif
endif
diff --git a/os-posix.c b/os-posix.c
index 4858650c3e..c5cb12226a 100644
index a4284e2c07..197a2120fd 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -28,6 +28,8 @@
@@ -29,6 +29,8 @@
#include <pwd.h>
#include <grp.h>
#include <libgen.h>
+#include <systemd/sd-journal.h>
+#include <syslog.h>
/* Needed early for CONFIG_BSD etc. */
#include "net/slirp.h"
@@ -287,9 +289,10 @@ void os_setup_post(void)
#include "qemu/error-report.h"
#include "qemu/log.h"
@@ -302,9 +304,10 @@ void os_setup_post(void)
dup2(fd, 0);
dup2(fd, 1);

View File

@ -13,21 +13,23 @@ safe migration is possible and makes sense.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: split up state_pending for 8.0]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
include/migration/misc.h | 3 ++
migration/meson.build | 2 +
migration/migration.c | 1 +
migration/pbs-state.c | 106 +++++++++++++++++++++++++++++++++++++++
migration/pbs-state.c | 104 +++++++++++++++++++++++++++++++++++++++
pve-backup.c | 1 +
qapi/block-core.json | 6 +++
6 files changed, 119 insertions(+)
6 files changed, 117 insertions(+)
create mode 100644 migration/pbs-state.c
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 465906710d..4f0aeceb6f 100644
index c9e200f4eb..12c99ebc69 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -75,4 +75,7 @@ bool migration_in_bg_snapshot(void);
@@ -117,4 +117,7 @@ bool migration_in_bg_snapshot(void);
/* migration/block-dirty-bitmap.c */
void dirty_bitmap_mig_init(void);
@ -36,38 +38,37 @@ index 465906710d..4f0aeceb6f 100644
+
#endif
diff --git a/migration/meson.build b/migration/meson.build
index 0842d00cd2..d012f4d8d3 100644
index 800f12a60d..35a4306183 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -6,8 +6,10 @@ migration_files = files(
@@ -7,7 +7,9 @@ migration_files = files(
'vmstate.c',
'qemu-file.c',
'yank_functions.c',
+ 'pbs-state.c',
)
softmmu_ss.add(migration_files)
+softmmu_ss.add(libproxmox_backup_qemu)
+system_ss.add(libproxmox_backup_qemu)
softmmu_ss.add(files(
system_ss.add(files(
'block-dirty-bitmap.c',
diff --git a/migration/migration.c b/migration/migration.c
index f485eea5fb..89b287180f 100644
index 86bf76e925..b8d7e471a4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -229,6 +229,7 @@ void migration_object_init(void)
@@ -239,6 +239,7 @@ void migration_object_init(void)
blk_mig_init();
ram_mig_init();
dirty_bitmap_mig_init();
+ pbs_state_mig_init();
}
void migration_cancel(const Error *error)
typedef struct {
diff --git a/migration/pbs-state.c b/migration/pbs-state.c
new file mode 100644
index 0000000000..29f2b3860d
index 0000000000..887e998b9e
--- /dev/null
+++ b/migration/pbs-state.c
@@ -0,0 +1,106 @@
@@ -0,0 +1,104 @@
+/*
+ * PBS (dirty-bitmap) state migration
+ */
@ -86,11 +87,8 @@ index 0000000000..29f2b3860d
+/* state is accessed via this static variable directly, 'opaque' is NULL */
+static PBSState pbs_state;
+
+static void pbs_state_save_pending(QEMUFile *f, void *opaque,
+ uint64_t max_size,
+ uint64_t *res_precopy_only,
+ uint64_t *res_compatible,
+ uint64_t *res_postcopy_only)
+static void pbs_state_pending(void *opaque, uint64_t *must_precopy,
+ uint64_t *can_postcopy)
+{
+ /* we send everything in save_setup, so nothing is ever pending */
+}
@ -160,7 +158,8 @@ index 0000000000..29f2b3860d
+static SaveVMHandlers savevm_pbs_state_handlers = {
+ .save_setup = pbs_state_save_setup,
+ .has_postcopy = pbs_state_has_postcopy,
+ .save_live_pending = pbs_state_save_pending,
+ .state_pending_exact = pbs_state_pending,
+ .state_pending_estimate = pbs_state_pending,
+ .is_active_iterate = pbs_state_is_active_iterate,
+ .load_state = pbs_state_load,
+ .is_active = pbs_state_is_active,
@ -175,22 +174,22 @@ index 0000000000..29f2b3860d
+ NULL);
+}
diff --git a/pve-backup.c b/pve-backup.c
index 88268bb586..fa9c6c4493 100644
index 9c13a92623..9d480a8eec 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1128,6 +1128,7 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
@@ -1091,6 +1091,7 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
ret->pbs_dirty_bitmap = true;
ret->pbs_dirty_bitmap_savevm = true;
+ ret->pbs_dirty_bitmap_migration = true;
ret->query_bitmap_info = true;
return ret;
}
ret->pbs_masterkey = true;
ret->backup_max_workers = true;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index bf559c6d52..24f30260c8 100644
index fc32ff9957..f516d8e95a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -879,6 +879,11 @@
@@ -1004,6 +1004,11 @@
# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
# safely be set for savevm-async.
#
@ -199,14 +198,14 @@ index bf559c6d52..24f30260c8 100644
+# migration cap if this is false/unset may lead
+# to crashes on migration!
+#
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
# @pbs-masterkey: True if the QMP backup call supports the 'master_keyfile'
# parameter.
#
##
@@ -886,6 +891,7 @@
@@ -1017,6 +1022,7 @@
'data': { 'pbs-dirty-bitmap': 'bool',
'query-bitmap-info': 'bool',
'pbs-dirty-bitmap-savevm': 'bool',
+ 'pbs-dirty-bitmap-migration': 'bool',
'pbs-library-version': 'str' } }
##
'pbs-masterkey': 'bool',
'pbs-library-version': 'str',
'backup-max-workers': 'bool' } }

View File

@ -1,74 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 8 Jul 2020 11:57:53 +0200
Subject: [PATCH] PVE: add query_proxmox_support QMP command
Generic interface for future use, currently used for PBS dirty-bitmap
backup support.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[PVE: query-proxmox-support: include library version]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
pve-backup.c | 9 +++++++++
qapi/block-core.json | 29 +++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/pve-backup.c b/pve-backup.c
index e113ab61b9..9318ca4f0c 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1072,3 +1072,12 @@ BackupStatus *qmp_query_backup(Error **errp)
return info;
}
+
+ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
+{
+ ProxmoxSupportStatus *ret = g_malloc0(sizeof(*ret));
+ ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
+ ret->pbs_dirty_bitmap = true;
+ ret->pbs_dirty_bitmap_savevm = true;
+ return ret;
+}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 45b63dfe26..8b0e0d92de 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -863,6 +863,35 @@
##
{ 'command': 'backup-cancel' }
+##
+# @ProxmoxSupportStatus:
+#
+# Contains info about supported features added by Proxmox.
+#
+# @pbs-dirty-bitmap: True if dirty-bitmap-incremental backups to PBS are
+# supported.
+#
+# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
+# safely be set for savevm-async.
+#
+# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
+#
+##
+{ 'struct': 'ProxmoxSupportStatus',
+ 'data': { 'pbs-dirty-bitmap': 'bool',
+ 'pbs-dirty-bitmap-savevm': 'bool',
+ 'pbs-library-version': 'str' } }
+
+##
+# @query-proxmox-support:
+#
+# Returns information about supported features added by Proxmox.
+#
+# Returns: @ProxmoxSupportStatus
+#
+##
+{ 'command': 'query-proxmox-support', 'returns': 'ProxmoxSupportStatus' }
+
##
# @BlockDeviceTimedStats:
#

View File

@ -1,441 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 19 Aug 2020 17:02:00 +0200
Subject: [PATCH] PVE: add query-pbs-bitmap-info QMP call
Returns advanced information about dirty bitmaps used (or not used) for
the latest PBS backup.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
monitor/hmp-cmds.c | 28 ++++++-----
pve-backup.c | 117 ++++++++++++++++++++++++++++++++-----------
qapi/block-core.json | 56 +++++++++++++++++++++
3 files changed, 159 insertions(+), 42 deletions(-)
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 670f783515..d819e5fc36 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -202,6 +202,7 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict)
void hmp_info_backup(Monitor *mon, const QDict *qdict)
{
BackupStatus *info;
+ PBSBitmapInfoList *bitmap_info;
info = qmp_query_backup(NULL);
@@ -232,26 +233,29 @@ void hmp_info_backup(Monitor *mon, const QDict *qdict)
// this should not happen normally
monitor_printf(mon, "Total size: %d\n", 0);
} else {
- bool incremental = false;
size_t total_or_dirty = info->total;
- if (info->has_transferred) {
- if (info->has_dirty && info->dirty) {
- if (info->dirty < info->total) {
- total_or_dirty = info->dirty;
- incremental = true;
- }
- }
+ bitmap_info = qmp_query_pbs_bitmap_info(NULL);
+
+ while (bitmap_info) {
+ monitor_printf(mon, "Drive %s:\n",
+ bitmap_info->value->drive);
+ monitor_printf(mon, " bitmap action: %s\n",
+ PBSBitmapAction_str(bitmap_info->value->action));
+ monitor_printf(mon, " size: %zd\n",
+ bitmap_info->value->size);
+ monitor_printf(mon, " dirty: %zd\n",
+ bitmap_info->value->dirty);
+ bitmap_info = bitmap_info->next;
}
- int per = (info->transferred * 100)/total_or_dirty;
-
- monitor_printf(mon, "Backup mode: %s\n", incremental ? "incremental" : "full");
+ qapi_free_PBSBitmapInfoList(bitmap_info);
int zero_per = (info->has_zero_bytes && info->zero_bytes) ?
(info->zero_bytes * 100)/info->total : 0;
monitor_printf(mon, "Total size: %zd\n", info->total);
+ int trans_per = (info->transferred * 100)/total_or_dirty;
monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
- info->transferred, per);
+ info->transferred, trans_per);
monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
info->zero_bytes, zero_per);
diff --git a/pve-backup.c b/pve-backup.c
index 9318ca4f0c..c85b2ecd83 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -46,6 +46,7 @@ static struct PVEBackupState {
size_t transferred;
size_t reused;
size_t zero_bytes;
+ GList *bitmap_list;
} stat;
int64_t speed;
VmaWriter *vmaw;
@@ -669,7 +670,6 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
size_t total = 0;
- size_t dirty = 0;
l = di_list;
while (l) {
@@ -690,18 +690,33 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
uuid_generate(uuid);
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.reused = 0;
+
+ /* clear previous backup's bitmap_list */
+ if (backup_state.stat.bitmap_list) {
+ GList *bl = backup_state.stat.bitmap_list;
+ while (bl) {
+ g_free(((PBSBitmapInfo *)bl->data)->drive);
+ g_free(bl->data);
+ bl = g_list_next(bl);
+ }
+ g_list_free(backup_state.stat.bitmap_list);
+ backup_state.stat.bitmap_list = NULL;
+ }
+
if (format == BACKUP_FORMAT_PBS) {
if (!task->has_password) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
- goto err;
+ goto err_mutex;
}
if (!task->has_backup_id) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
- goto err;
+ goto err_mutex;
}
if (!task->has_backup_time) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
- goto err;
+ goto err_mutex;
}
int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
@@ -728,12 +743,12 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"proxmox_backup_new failed: %s", pbs_err);
proxmox_backup_free_error(pbs_err);
- goto err;
+ goto err_mutex;
}
int connect_result = proxmox_backup_co_connect(pbs, task->errp);
if (connect_result < 0)
- goto err;
+ goto err_mutex;
/* register all devices */
l = di_list;
@@ -744,6 +759,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
di->block_size = dump_cb_block_size;
const char *devname = bdrv_get_device_name(di->bs);
+ PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED;
+ size_t dirty = di->size;
BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
bool expect_only_dirty = false;
@@ -752,49 +769,59 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (bitmap == NULL) {
bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
if (!bitmap) {
- goto err;
+ goto err_mutex;
}
+ action = PBS_BITMAP_ACTION_NEW;
} else {
expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
}
if (expect_only_dirty) {
- dirty += bdrv_get_dirty_count(bitmap);
+ /* track clean chunks as reused */
+ dirty = MIN(bdrv_get_dirty_count(bitmap), di->size);
+ backup_state.stat.reused += di->size - dirty;
+ action = PBS_BITMAP_ACTION_USED;
} else {
/* mark entire bitmap as dirty to make full backup */
bdrv_set_dirty_bitmap(bitmap, 0, di->size);
- dirty += di->size;
+ if (action != PBS_BITMAP_ACTION_NEW) {
+ action = PBS_BITMAP_ACTION_INVALID;
+ }
}
di->bitmap = bitmap;
} else {
- dirty += di->size;
-
/* after a full backup the old dirty bitmap is invalid anyway */
if (bitmap != NULL) {
bdrv_release_dirty_bitmap(bitmap);
+ action = PBS_BITMAP_ACTION_NOT_USED_REMOVED;
}
}
int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
if (dev_id < 0) {
- goto err;
+ goto err_mutex;
}
if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
- goto err;
+ goto err_mutex;
}
di->dev_id = dev_id;
+
+ PBSBitmapInfo *info = g_malloc(sizeof(*info));
+ info->drive = g_strdup(devname);
+ info->action = action;
+ info->size = di->size;
+ info->dirty = dirty;
+ backup_state.stat.bitmap_list = g_list_append(backup_state.stat.bitmap_list, info);
}
} else if (format == BACKUP_FORMAT_VMA) {
- dirty = total;
-
vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
if (!vmaw) {
if (local_err) {
error_propagate(task->errp, local_err);
}
- goto err;
+ goto err_mutex;
}
/* register all devices for vma writer */
@@ -804,7 +831,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
l = g_list_next(l);
if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, task->errp))) {
- goto err;
+ goto err_mutex;
}
const char *devname = bdrv_get_device_name(di->bs);
@@ -812,16 +839,14 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (di->dev_id <= 0) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"register_stream failed");
- goto err;
+ goto err_mutex;
}
}
} else if (format == BACKUP_FORMAT_DIR) {
- dirty = total;
-
if (mkdir(task->backup_file, 0640) != 0) {
error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
task->backup_file);
- goto err;
+ goto err_mutex;
}
backup_dir = task->backup_file;
@@ -838,18 +863,18 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
di->size, flags, false, &local_err);
if (local_err) {
error_propagate(task->errp, local_err);
- goto err;
+ goto err_mutex;
}
di->target = bdrv_open(di->targetfile, NULL, NULL, flags, &local_err);
if (!di->target) {
error_propagate(task->errp, local_err);
- goto err;
+ goto err_mutex;
}
}
} else {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
- goto err;
+ goto err_mutex;
}
@@ -857,7 +882,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (task->has_config_file) {
if (pvebackup_co_add_config(task->config_file, config_name, format, backup_dir,
vmaw, pbs, task->errp) != 0) {
- goto err;
+ goto err_mutex;
}
}
@@ -865,12 +890,11 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (task->has_firewall_file) {
if (pvebackup_co_add_config(task->firewall_file, firewall_name, format, backup_dir,
vmaw, pbs, task->errp) != 0) {
- goto err;
+ goto err_mutex;
}
}
/* initialize global backup_state now */
-
- qemu_mutex_lock(&backup_state.stat.lock);
+ /* note: 'reused' and 'bitmap_list' are initialized earlier */
if (backup_state.stat.error) {
error_free(backup_state.stat.error);
@@ -890,10 +914,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
char *uuid_str = g_strdup(backup_state.stat.uuid_str);
backup_state.stat.total = total;
- backup_state.stat.dirty = dirty;
+ backup_state.stat.dirty = total - backup_state.stat.reused;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
- backup_state.stat.reused = format == BACKUP_FORMAT_PBS && dirty >= total ? 0 : total - dirty;
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -910,6 +933,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
task->result = uuid_info;
return;
+err_mutex:
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
err:
l = di_list;
@@ -1073,11 +1099,42 @@ BackupStatus *qmp_query_backup(Error **errp)
return info;
}
+PBSBitmapInfoList *qmp_query_pbs_bitmap_info(Error **errp)
+{
+ PBSBitmapInfoList *head = NULL, **p_next = &head;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+
+ GList *l = backup_state.stat.bitmap_list;
+ while (l) {
+ PBSBitmapInfo *info = (PBSBitmapInfo *)l->data;
+ l = g_list_next(l);
+
+ /* clone bitmap info to avoid auto free after QMP marshalling */
+ PBSBitmapInfo *info_ret = g_malloc0(sizeof(*info_ret));
+ info_ret->drive = g_strdup(info->drive);
+ info_ret->action = info->action;
+ info_ret->size = info->size;
+ info_ret->dirty = info->dirty;
+
+ PBSBitmapInfoList *info_list = g_malloc0(sizeof(*info_list));
+ info_list->value = info_ret;
+
+ *p_next = info_list;
+ p_next = &info_list->next;
+ }
+
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ return head;
+}
+
ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
{
ProxmoxSupportStatus *ret = g_malloc0(sizeof(*ret));
ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
ret->pbs_dirty_bitmap = true;
ret->pbs_dirty_bitmap_savevm = true;
+ ret->query_bitmap_info = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8b0e0d92de..7fde927621 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -871,6 +871,8 @@
# @pbs-dirty-bitmap: True if dirty-bitmap-incremental backups to PBS are
# supported.
#
+# @query-bitmap-info: True if the 'query-pbs-bitmap-info' QMP call is supported.
+#
# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
# safely be set for savevm-async.
#
@@ -879,6 +881,7 @@
##
{ 'struct': 'ProxmoxSupportStatus',
'data': { 'pbs-dirty-bitmap': 'bool',
+ 'query-bitmap-info': 'bool',
'pbs-dirty-bitmap-savevm': 'bool',
'pbs-library-version': 'str' } }
@@ -892,6 +895,59 @@
##
{ 'command': 'query-proxmox-support', 'returns': 'ProxmoxSupportStatus' }
+##
+# @PBSBitmapAction:
+#
+# An action taken on a dirty-bitmap when a backup job was started.
+#
+# @not-used: Bitmap mode was not enabled.
+#
+# @not-used-removed: Bitmap mode was not enabled, but a bitmap from a
+# previous backup still existed and was removed.
+#
+# @new: A new bitmap was attached to the drive for this backup.
+#
+# @used: An existing bitmap will be used to only backup changed data.
+#
+# @invalid: A bitmap existed, but had to be cleared since it's associated
+# base snapshot did not match the base given for the current job or
+# the crypt mode has changed.
+#
+##
+{ 'enum': 'PBSBitmapAction',
+ 'data': ['not-used', 'not-used-removed', 'new', 'used', 'invalid'] }
+
+##
+# @PBSBitmapInfo:
+#
+# Contains information about dirty bitmaps used for each drive in a PBS backup.
+#
+# @drive: The underlying drive.
+#
+# @action: The action that was taken when the backup started.
+#
+# @size: The total size of the drive.
+#
+# @dirty: How much of the drive is considered dirty and will be backed up,
+# or 'size' if everything will be.
+#
+##
+{ 'struct': 'PBSBitmapInfo',
+ 'data': { 'drive': 'str', 'action': 'PBSBitmapAction', 'size': 'int',
+ 'dirty': 'int' } }
+
+##
+# @query-pbs-bitmap-info:
+#
+# Returns information about dirty bitmaps used on the most recently started
+# backup. Returns nothing when the last backup was not using PBS or if no
+# backup occured in this session.
+#
+# Returns: @PBSBitmapInfo
+#
+##
+{ 'command': 'query-pbs-bitmap-info', 'returns': ['PBSBitmapInfo'] }
+
##
# @BlockDeviceTimedStats:
#

View File

@ -19,10 +19,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9aba7d9c22..f4ecf9c9f9 100644
index 2708abf3d7..fb17c01308 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -538,7 +538,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
@@ -540,7 +540,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, &local_err)) {
error_report_err(local_err);

View File

@ -21,10 +21,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 30 insertions(+)
diff --git a/block/iscsi.c b/block/iscsi.c
index a316d46d96..3ed4a50c0d 100644
index 2ff14b7472..46f275fbf7 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1387,12 +1387,42 @@ static char *get_initiator_name(QemuOpts *opts)
@@ -1392,12 +1392,42 @@ static char *get_initiator_name(QemuOpts *opts)
const char *name;
char *iscsi_name;
UuidInfo *uuid_info;

View File

@ -11,10 +11,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/stream.c b/block/stream.c
index 694709bd25..e09bd5c4ef 100644
index 7031eef12b..d2da83ae7c 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -28,7 +28,7 @@ enum {
@@ -27,7 +27,7 @@ enum {
* large enough to process multiple clusters in a single call, so
* that populating contiguous regions of the image is efficient.
*/

View File

@ -1,293 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Thu, 20 Aug 2020 14:25:00 +0200
Subject: [PATCH] PVE-Backup: Use a transaction to synchronize job states
By using a JobTxn, we can sync dirty bitmaps only when *all* jobs were
successful - meaning we don't need to remove them when the backup fails,
since QEMU's BITMAP_SYNC_MODE_ON_SUCCESS will now handle that for us.
To keep the rate-limiting and IO impact from before, we use a sequential
transaction, so drives will still be backed up one after the other.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: add new force parameter to job_cancel_sync calls
adapt for new job lock mechanism replacing AioContext locks]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 163 ++++++++++++++++-----------------------------------
1 file changed, 50 insertions(+), 113 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index c85b2ecd83..b5fb844434 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -52,6 +52,7 @@ static struct PVEBackupState {
VmaWriter *vmaw;
ProxmoxBackupHandle *pbs;
GList *di_list;
+ JobTxn *txn;
QemuMutex backup_mutex;
CoMutex dump_callback_mutex;
} backup_state;
@@ -71,34 +72,12 @@ typedef struct PVEBackupDevInfo {
size_t size;
uint64_t block_size;
uint8_t dev_id;
- bool completed;
char targetfile[PATH_MAX];
BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
+ BlockJob *job;
} PVEBackupDevInfo;
-static void pvebackup_run_next_job(void);
-
-static BlockJob *
-lookup_active_block_job(PVEBackupDevInfo *di)
-{
- if (!di->completed && di->bs) {
- WITH_JOB_LOCK_GUARD() {
- for (BlockJob *job = block_job_next_locked(NULL); job; job = block_job_next_locked(job)) {
- if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
- continue;
- }
-
- BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
- if (bjob && bjob->source_bs == di->bs) {
- return job;
- }
- }
- }
- }
- return NULL;
-}
-
static void pvebackup_propagate_error(Error *err)
{
qemu_mutex_lock(&backup_state.stat.lock);
@@ -274,18 +253,6 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
if (local_err != NULL) {
pvebackup_propagate_error(local_err);
}
- } else {
- // on error or cancel we cannot ensure synchronization of dirty
- // bitmaps with backup server, so remove all and do full backup next
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
-
- if (di->bitmap) {
- bdrv_release_dirty_bitmap(di->bitmap);
- }
- }
}
proxmox_backup_disconnect(backup_state.pbs);
@@ -324,8 +291,6 @@ static void pvebackup_complete_cb(void *opaque, int ret)
qemu_mutex_lock(&backup_state.backup_mutex);
- di->completed = true;
-
if (ret < 0) {
Error *local_err = NULL;
error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
@@ -338,20 +303,17 @@ static void pvebackup_complete_cb(void *opaque, int ret)
block_on_coroutine_fn(pvebackup_complete_stream, di);
- // remove self from job queue
+ // remove self from job list
backup_state.di_list = g_list_remove(backup_state.di_list, di);
- if (di->bitmap && ret < 0) {
- // on error or cancel we cannot ensure synchronization of dirty
- // bitmaps with backup server, so remove all and do full backup next
- bdrv_release_dirty_bitmap(di->bitmap);
- }
-
g_free(di);
- qemu_mutex_unlock(&backup_state.backup_mutex);
+ /* call cleanup if we're the last job */
+ if (!g_list_first(backup_state.di_list)) {
+ block_on_coroutine_fn(pvebackup_co_cleanup, NULL);
+ }
- pvebackup_run_next_job();
+ qemu_mutex_unlock(&backup_state.backup_mutex);
}
static void pvebackup_cancel(void)
@@ -373,32 +335,28 @@ static void pvebackup_cancel(void)
proxmox_backup_abort(backup_state.pbs, "backup canceled");
}
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- for(;;) {
-
- BlockJob *next_job = NULL;
+ /* it's enough to cancel one job in the transaction, the rest will follow
+ * automatically */
+ GList *bdi = g_list_first(backup_state.di_list);
+ BlockJob *cancel_job = bdi && bdi->data ?
+ ((PVEBackupDevInfo *)bdi->data)->job :
+ NULL;
- qemu_mutex_lock(&backup_state.backup_mutex);
-
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
-
- BlockJob *job = lookup_active_block_job(di);
- if (job != NULL) {
- next_job = job;
- break;
- }
+ /* ref the job before releasing the mutex, just to be safe */
+ if (cancel_job) {
+ WITH_JOB_LOCK_GUARD() {
+ job_ref_locked(&cancel_job->job);
}
+ }
- qemu_mutex_unlock(&backup_state.backup_mutex);
+ /* job_cancel_sync may enter the job, so we need to release the
+ * backup_mutex to avoid deadlock */
+ qemu_mutex_unlock(&backup_state.backup_mutex);
- if (next_job) {
- job_cancel_sync(&next_job->job, true);
- } else {
- break;
+ if (cancel_job) {
+ WITH_JOB_LOCK_GUARD() {
+ job_cancel_sync_locked(&cancel_job->job, true);
+ job_unref_locked(&cancel_job->job);
}
}
}
@@ -458,49 +416,19 @@ static int coroutine_fn pvebackup_co_add_config(
goto out;
}
-bool job_should_pause_locked(Job *job);
-
-static void pvebackup_run_next_job(void)
-{
- assert(!qemu_in_coroutine());
-
- qemu_mutex_lock(&backup_state.backup_mutex);
-
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
-
- BlockJob *job = lookup_active_block_job(di);
-
- if (job) {
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- WITH_JOB_LOCK_GUARD() {
- if (job_should_pause_locked(&job->job)) {
- bool error_or_canceled = pvebackup_error_or_canceled();
- if (error_or_canceled) {
- job_cancel_sync_locked(&job->job, true);
- } else {
- job_resume_locked(&job->job);
- }
- }
- }
- return;
- }
- }
-
- block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
-
- qemu_mutex_unlock(&backup_state.backup_mutex);
-}
-
static bool create_backup_jobs(void) {
assert(!qemu_in_coroutine());
Error *local_err = NULL;
+ /* create job transaction to synchronize bitmap commit and cancel all
+ * jobs in case one errors */
+ if (backup_state.txn) {
+ job_txn_unref(backup_state.txn);
+ }
+ backup_state.txn = job_txn_new_seq();
+
BackupPerf perf = { .max_workers = 16 };
/* create and start all jobs (paused state) */
@@ -523,7 +451,7 @@ static bool create_backup_jobs(void) {
BlockJob *job = backup_job_create(
NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
bitmap_mode, false, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
- JOB_DEFAULT, pvebackup_complete_cb, di, NULL, &local_err);
+ JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn, &local_err);
aio_context_release(aio_context);
@@ -535,7 +463,8 @@ static bool create_backup_jobs(void) {
pvebackup_propagate_error(create_job_err);
break;
}
- job_start(&job->job);
+
+ di->job = job;
bdrv_unref(di->target);
di->target = NULL;
@@ -553,6 +482,12 @@ static bool create_backup_jobs(void) {
bdrv_unref(di->target);
di->target = NULL;
}
+
+ if (di->job) {
+ WITH_JOB_LOCK_GUARD() {
+ job_unref_locked(&di->job->job);
+ }
+ }
}
}
@@ -943,10 +878,6 @@ err:
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (di->bitmap) {
- bdrv_release_dirty_bitmap(di->bitmap);
- }
-
if (di->target) {
bdrv_unref(di->target);
}
@@ -1035,9 +966,15 @@ UuidInfo *qmp_backup(
block_on_coroutine_fn(pvebackup_co_prepare, &task);
if (*errp == NULL) {
- create_backup_jobs();
+ bool errors = create_backup_jobs();
qemu_mutex_unlock(&backup_state.backup_mutex);
- pvebackup_run_next_job();
+
+ if (!errors) {
+ /* start the first job in the transaction
+ * note: this might directly enter the job, so we need to do this
+ * after unlocking the backup_mutex */
+ job_txn_start_seq(backup_state.txn);
+ }
} else {
qemu_mutex_unlock(&backup_state.backup_mutex);
}

View File

@ -19,26 +19,33 @@ well.
This only worked if the target supports backing images, so up until now
only for qcow2, with alloc-track any driver for the target can be used.
If 'auto-remove' is set, alloc-track will automatically detach itself
once the backing image is removed. It will be replaced by 'file'.
Replacing the node cannot be done in the
track_co_change_backing_file() callback, because replacing a node
cannot happen in a coroutine and requires the block graph lock
exclusively. Could either become a special option for the stream job,
or maybe the upcoming blockdev-replace QMP command can be used in the
future.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to changed function signatures
make error return value consistent with QEMU]
make error return value consistent with QEMU
avoid premature break during read
adhere to block graph lock requirements]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/alloc-track.c | 350 ++++++++++++++++++++++++++++++++++++++++++++
block/alloc-track.c | 366 ++++++++++++++++++++++++++++++++++++++++++++
block/meson.build | 1 +
2 files changed, 351 insertions(+)
block/stream.c | 34 ++++
3 files changed, 401 insertions(+)
create mode 100644 block/alloc-track.c
diff --git a/block/alloc-track.c b/block/alloc-track.c
new file mode 100644
index 0000000000..43d40d11af
index 0000000000..b9f8ea9137
--- /dev/null
+++ b/block/alloc-track.c
@@ -0,0 +1,350 @@
@@ -0,0 +1,366 @@
+/*
+ * Node to allow backing images to be applied to any node. Assumes a blank
+ * image to begin with, only new writes are tracked as allocated, thus this
@ -54,9 +61,12 @@ index 0000000000..43d40d11af
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/dirty-bitmap.h"
+#include "block/graph-lock.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/module.h"
+#include "sysemu/block-backend.h"
@ -65,12 +75,12 @@ index 0000000000..43d40d11af
+
+typedef enum DropState {
+ DropNone,
+ DropRequested,
+ DropInProgress,
+} DropState;
+
+typedef struct {
+ BdrvDirtyBitmap *bitmap;
+ uint64_t granularity;
+ DropState drop_state;
+ bool auto_remove;
+} BDRVAllocTrackState;
@ -89,26 +99,29 @@ index 0000000000..43d40d11af
+ },
+};
+
+static void track_refresh_limits(BlockDriverState *bs, Error **errp)
+static void GRAPH_RDLOCK
+track_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+ BlockDriverInfo bdi;
+ BDRVAllocTrackState *s = bs->opaque;
+
+ if (!bs->file) {
+ return;
+ }
+
+ /* always use alignment from underlying write device so RMW cycle for
+ * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
+ * cluster allocation in 'file') */
+ bdrv_get_info(bs->file->bs, &bdi);
+ /*
+ * Always use alignment from underlying write device so RMW cycle for
+ * bdrv_pwritev reads data from our backing via track_co_preadv. Also use at
+ * least the bitmap granularity.
+ */
+ bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
+ MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
+ s->granularity);
+}
+
+static int track_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ BdrvChild *file = NULL;
+ QemuOpts *opts;
+ Error *local_err = NULL;
+ int ret = 0;
@ -124,18 +137,45 @@ index 0000000000..43d40d11af
+ s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
+
+ /* open the target (write) node, backing will be attached by block layer */
+ bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+ BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
+ &local_err);
+ file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+ BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
+ &local_err);
+ bdrv_graph_wrlock();
+ bs->file = file;
+ bdrv_graph_wrunlock();
+ if (local_err) {
+ ret = -EINVAL;
+ error_propagate(errp, local_err);
+ goto fail;
+ }
+
+ bdrv_graph_rdlock_main_loop();
+ BlockDriverInfo bdi = {0};
+ ret = bdrv_get_info(bs->file->bs, &bdi);
+ if (ret < 0) {
+ /*
+ * Not a hard failure. Worst that can happen is partial cluster
+ * allocation in the write target. However, the driver here returns its
+ * allocation status based on the dirty bitmap, so any other data that
+ * maps to such a cluster will still be copied later by a stream job (or
+ * during writes to that cluster).
+ */
+ warn_report("alloc-track: unable to query cluster size for write target: %s",
+ strerror(ret));
+ }
+ ret = 0;
+ /*
+ * Always consider alignment from underlying write device so RMW cycle for
+ * bdrv_pwritev reads data from our backing via track_co_preadv. Also try to
+ * avoid partial cluster allocation in the write target by considering the
+ * cluster size.
+ */
+ s->granularity = MAX(bs->file->bs->bl.request_alignment,
+ MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
+ track_refresh_limits(bs, errp);
+ uint64_t gran = bs->bl.request_alignment;
+ s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
+ s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, s->granularity, NULL,
+ &local_err);
+ bdrv_graph_rdunlock_main_loop();
+ if (local_err) {
+ ret = -EIO;
+ error_propagate(errp, local_err);
@ -146,7 +186,9 @@ index 0000000000..43d40d11af
+
+fail:
+ if (ret < 0) {
+ bdrv_graph_wrlock();
+ bdrv_unref_child(bs, bs->file);
+ bdrv_graph_wrunlock();
+ if (s->bitmap) {
+ bdrv_release_dirty_bitmap(s->bitmap);
+ }
@ -163,13 +205,15 @@ index 0000000000..43d40d11af
+ }
+}
+
+static int64_t track_getlength(BlockDriverState *bs)
+static coroutine_fn int64_t GRAPH_RDLOCK
+track_co_getlength(BlockDriverState *bs)
+{
+ return bdrv_getlength(bs->file->bs);
+ return bdrv_co_getlength(bs->file->bs);
+}
+
+static int coroutine_fn track_co_preadv(BlockDriverState *bs,
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ QEMUIOVector local_qiov;
@ -215,7 +259,8 @@ index 0000000000..43d40d11af
+ ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
+ &local_qiov, flags);
+ } else {
+ ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
+ qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
+ ret = 0;
+ }
+
+ if (ret != 0) {
@ -226,36 +271,39 @@ index 0000000000..43d40d11af
+ return ret;
+}
+
+static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
+ int64_t offset, int64_t bytes, BdrvRequestFlags flags)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ BdrvRequestFlags flags)
+{
+ return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
+ int64_t offset, int64_t bytes)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+ return bdrv_co_pdiscard(bs->file, offset, bytes);
+}
+
+static coroutine_fn int track_co_flush(BlockDriverState *bs)
+static coroutine_fn int GRAPH_RDLOCK
+track_co_flush(BlockDriverState *bs)
+{
+ return bdrv_co_flush(bs->file->bs);
+}
+
+static int coroutine_fn track_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_block_status(BlockDriverState *bs, bool want_zero,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+
@ -281,10 +329,10 @@ index 0000000000..43d40d11af
+ return 0;
+}
+
+static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
+ BdrvChildRole role, BlockReopenQueue *reopen_queue,
+ uint64_t perm, uint64_t shared,
+ uint64_t *nperm, uint64_t *nshared)
+static void GRAPH_RDLOCK
+track_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
+ BlockReopenQueue *reopen_queue, uint64_t perm, uint64_t shared,
+ uint64_t *nperm, uint64_t *nshared)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+
@ -307,53 +355,28 @@ index 0000000000..43d40d11af
+ }
+}
+
+static void track_drop(void *opaque)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_change_backing_file(BlockDriverState *bs, const char *backing_file,
+ const char *backing_fmt)
+{
+ BlockDriverState *bs = (BlockDriverState*)opaque;
+ BlockDriverState *file = bs->file->bs;
+ BDRVAllocTrackState *s = bs->opaque;
+
+ assert(file);
+
+ /* we rely on the fact that we're not used anywhere else, so let's wait
+ * until we're only used once - in the drive connected to the guest (and one
+ * ref is held by bdrv_ref in track_change_backing_file) */
+ if (bs->refcnt > 2) {
+ aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, opaque);
+ return;
+ }
+ AioContext *aio_context = bdrv_get_aio_context(bs);
+ aio_context_acquire(aio_context);
+
+ bdrv_drained_begin(bs);
+
+ /* now that we're drained, we can safely set 'DropInProgress' */
+ s->drop_state = DropInProgress;
+ bdrv_child_refresh_perms(bs, bs->file, &error_abort);
+
+ bdrv_replace_node(bs, file, &error_abort);
+ bdrv_set_backing_hd(bs, NULL, &error_abort);
+ bdrv_drained_end(bs);
+ bdrv_unref(bs);
+ aio_context_release(aio_context);
+}
+
+static int track_change_backing_file(BlockDriverState *bs,
+ const char *backing_file,
+ const char *backing_fmt)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ if (s->auto_remove && s->drop_state == DropNone &&
+ backing_file == NULL && backing_fmt == NULL)
+ {
+ /* backing file has been disconnected, there's no longer any use for
+ * this node, so let's remove ourselves from the block graph - we need
+ * to schedule this for later however, since when this function is
+ * called, the blockjob modifying us is probably not done yet and has a
+ * blocker on 'bs' */
+ s->drop_state = DropRequested;
+ bdrv_ref(bs);
+ aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
+ /*
+ * Note that the actual backing file graph change is already done in the
+ * stream job itself with bdrv_set_backing_hd_drained(), so no need to
+ * actually do anything here. But still needs to be implemented, to make
+ * our caller (i.e. bdrv_co_change_backing_file() do the right thing).
+ *
+ * FIXME
+ * We'd like to auto-remove ourselves from the block graph, but it cannot
+ * be done from a coroutine. Currently done in the stream job, where it
+ * kinda fits better, but in the long-term, a special parameter would be
+ * nice (or done via qemu-server via upcoming blockdev-replace QMP command).
+ */
+ if (backing_file == NULL) {
+ BDRVAllocTrackState *s = bs->opaque;
+ bdrv_drained_begin(bs);
+ s->drop_state = DropInProgress;
+ bdrv_child_refresh_perms(bs, bs->file, &error_abort);
+ bdrv_drained_end(bs);
+ }
+
+ return 0;
@ -365,7 +388,7 @@ index 0000000000..43d40d11af
+
+ .bdrv_file_open = track_open,
+ .bdrv_close = track_close,
+ .bdrv_getlength = track_getlength,
+ .bdrv_co_getlength = track_co_getlength,
+ .bdrv_child_perm = track_child_perm,
+ .bdrv_refresh_limits = track_refresh_limits,
+
@ -380,7 +403,7 @@ index 0000000000..43d40d11af
+ .supports_backing = true,
+
+ .bdrv_co_block_status = track_co_block_status,
+ .bdrv_change_backing_file = track_change_backing_file,
+ .bdrv_co_change_backing_file = track_co_change_backing_file,
+};
+
+static void bdrv_alloc_track_init(void)
@ -390,7 +413,7 @@ index 0000000000..43d40d11af
+
+block_init(bdrv_alloc_track_init);
diff --git a/block/meson.build b/block/meson.build
index 7ef2fa72d5..15352f579f 100644
index 1945e04eeb..2873f3a25a 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -2,6 +2,7 @@ block_ss.add(genh)
@ -401,3 +424,48 @@ index 7ef2fa72d5..15352f579f 100644
'amend.c',
'backup.c',
'backup-dump.c',
diff --git a/block/stream.c b/block/stream.c
index d2da83ae7c..f941cba14e 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -120,6 +120,40 @@ static int stream_prepare(Job *job)
ret = -EPERM;
goto out;
}
+
+ /*
+ * This cannot be done in the co_change_backing_file callback, because
+ * bdrv_replace_node() cannot be done in a coroutine. The latter also
+ * requires the graph lock exclusively. Only required for the
+ * alloc-track driver.
+ *
+ * The long-term plan is to either have an explicit parameter for the
+ * stream job or use the upcoming blockdev-replace QMP command.
+ */
+ if (base_id == NULL && strcmp(unfiltered_bs->drv->format_name, "alloc-track") == 0) {
+ BlockDriverState *file_bs;
+
+ bdrv_graph_rdlock_main_loop();
+ file_bs = unfiltered_bs->file->bs;
+ bdrv_graph_rdunlock_main_loop();
+
+ bdrv_ref(unfiltered_bs); // unrefed by bdrv_replace_node()
+ bdrv_drained_begin(file_bs);
+ bdrv_graph_wrlock();
+
+ bdrv_replace_node(unfiltered_bs, file_bs, &local_err);
+
+ bdrv_graph_wrunlock();
+ bdrv_drained_end(file_bs);
+ bdrv_unref(unfiltered_bs);
+
+ if (local_err) {
+ error_prepend(&local_err, "failed to replace alloc-track node: ");
+ error_report_err(local_err);
+ ret = -EPERM;
+ goto out;
+ }
+ }
}
out:

View File

@ -1,499 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 28 Sep 2020 13:40:51 +0200
Subject: [PATCH] PVE-Backup: Don't block on finishing and cleanup
create_backup_jobs
proxmox_backup_co_finish is already async, but previously we would wait
for the coroutine using block_on_coroutine_fn(). Avoid this by
scheduling pvebackup_co_complete_stream (and thus pvebackup_co_cleanup)
as a real coroutine when calling from pvebackup_complete_cb. This is ok,
since complete_stream uses the backup_mutex internally to synchronize,
and other streams can happily continue writing in the meantime anyway.
To accomodate, backup_mutex is converted to a CoMutex. This means
converting every user to a coroutine. This is not just useful here, but
will come in handy once this series[0] is merged, and QMP calls can be
yield-able coroutines too. Then we can also finally get rid of
block_on_coroutine_fn.
Cases of aio_context_acquire/release from within what is now a coroutine
are changed to aio_co_reschedule_self, which works since a running
coroutine always holds the aio lock for the context it is running in.
job_cancel_sync is called from a BH since it can't be run from a
coroutine (uses AIO_WAIT_WHILE internally).
Same thing for create_backup_jobs, which is converted to a BH too.
To communicate the finishing state, a new property is introduced to
query-backup: 'finishing'. A new state is explicitly not used, since
that would break compatibility with older qemu-server versions.
Also fix create_backup_jobs:
No more weird bool returns, just the standard "errp" format used
everywhere else too. With this, if backup_job_create fails, the error
message is actually returned over QMP and can be shown to the user.
To facilitate correct cleanup on such an error, we call
create_backup_jobs as a bottom half directly from pvebackup_co_prepare.
This additionally allows us to actually hold the backup_mutex during
operation.
Also add a job_cancel_sync before job_unref, since a job must be in
STATUS_NULL to be deleted by unref, which could trigger an assert
before.
[0] https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg03515.html
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: add new force parameter to job_cancel_sync calls]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 212 +++++++++++++++++++++++++++----------------
qapi/block-core.json | 5 +-
2 files changed, 138 insertions(+), 79 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index b5fb844434..88268bb586 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -33,7 +33,9 @@ const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
static struct PVEBackupState {
struct {
- // Everithing accessed from qmp_backup_query command is protected using lock
+ // Everything accessed from qmp_backup_query command is protected using
+ // this lock. Do NOT hold this lock for long times, as it is sometimes
+ // acquired from coroutines, and thus any wait time may block the guest.
QemuMutex lock;
Error *error;
time_t start_time;
@@ -47,20 +49,22 @@ static struct PVEBackupState {
size_t reused;
size_t zero_bytes;
GList *bitmap_list;
+ bool finishing;
+ bool starting;
} stat;
int64_t speed;
VmaWriter *vmaw;
ProxmoxBackupHandle *pbs;
GList *di_list;
JobTxn *txn;
- QemuMutex backup_mutex;
+ CoMutex backup_mutex;
CoMutex dump_callback_mutex;
} backup_state;
static void pvebackup_init(void)
{
qemu_mutex_init(&backup_state.stat.lock);
- qemu_mutex_init(&backup_state.backup_mutex);
+ qemu_co_mutex_init(&backup_state.backup_mutex);
qemu_co_mutex_init(&backup_state.dump_callback_mutex);
}
@@ -72,6 +76,7 @@ typedef struct PVEBackupDevInfo {
size_t size;
uint64_t block_size;
uint8_t dev_id;
+ int completed_ret; // INT_MAX if not completed
char targetfile[PATH_MAX];
BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
@@ -227,12 +232,12 @@ pvebackup_co_dump_vma_cb(
}
// assumes the caller holds backup_mutex
-static void coroutine_fn pvebackup_co_cleanup(void *unused)
+static void coroutine_fn pvebackup_co_cleanup(void)
{
assert(qemu_in_coroutine());
qemu_mutex_lock(&backup_state.stat.lock);
- backup_state.stat.end_time = time(NULL);
+ backup_state.stat.finishing = true;
qemu_mutex_unlock(&backup_state.stat.lock);
if (backup_state.vmaw) {
@@ -261,35 +266,29 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
g_list_free(backup_state.di_list);
backup_state.di_list = NULL;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.end_time = time(NULL);
+ backup_state.stat.finishing = false;
+ qemu_mutex_unlock(&backup_state.stat.lock);
}
-// assumes the caller holds backup_mutex
-static void coroutine_fn pvebackup_complete_stream(void *opaque)
+static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
{
PVEBackupDevInfo *di = opaque;
+ int ret = di->completed_ret;
- bool error_or_canceled = pvebackup_error_or_canceled();
-
- if (backup_state.vmaw) {
- vma_writer_close_stream(backup_state.vmaw, di->dev_id);
+ qemu_mutex_lock(&backup_state.stat.lock);
+ bool starting = backup_state.stat.starting;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+ if (starting) {
+ /* in 'starting' state, no tasks have been run yet, meaning we can (and
+ * must) skip all cleanup, as we don't know what has and hasn't been
+ * initialized yet. */
+ return;
}
- if (backup_state.pbs && !error_or_canceled) {
- Error *local_err = NULL;
- proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
- if (local_err != NULL) {
- pvebackup_propagate_error(local_err);
- }
- }
-}
-
-static void pvebackup_complete_cb(void *opaque, int ret)
-{
- assert(!qemu_in_coroutine());
-
- PVEBackupDevInfo *di = opaque;
-
- qemu_mutex_lock(&backup_state.backup_mutex);
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
if (ret < 0) {
Error *local_err = NULL;
@@ -301,7 +300,19 @@ static void pvebackup_complete_cb(void *opaque, int ret)
assert(di->target == NULL);
- block_on_coroutine_fn(pvebackup_complete_stream, di);
+ bool error_or_canceled = pvebackup_error_or_canceled();
+
+ if (backup_state.vmaw) {
+ vma_writer_close_stream(backup_state.vmaw, di->dev_id);
+ }
+
+ if (backup_state.pbs && !error_or_canceled) {
+ Error *local_err = NULL;
+ proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
+ if (local_err != NULL) {
+ pvebackup_propagate_error(local_err);
+ }
+ }
// remove self from job list
backup_state.di_list = g_list_remove(backup_state.di_list, di);
@@ -310,21 +321,46 @@ static void pvebackup_complete_cb(void *opaque, int ret)
/* call cleanup if we're the last job */
if (!g_list_first(backup_state.di_list)) {
- block_on_coroutine_fn(pvebackup_co_cleanup, NULL);
+ pvebackup_co_cleanup();
}
- qemu_mutex_unlock(&backup_state.backup_mutex);
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
-static void pvebackup_cancel(void)
+static void pvebackup_complete_cb(void *opaque, int ret)
{
- assert(!qemu_in_coroutine());
+ PVEBackupDevInfo *di = opaque;
+ di->completed_ret = ret;
+
+ /*
+ * Schedule stream cleanup in async coroutine. close_image and finish might
+ * take a while, so we can't block on them here. This way it also doesn't
+ * matter if we're already running in a coroutine or not.
+ * Note: di is a pointer to an entry in the global backup_state struct, so
+ * it stays valid.
+ */
+ Coroutine *co = qemu_coroutine_create(pvebackup_co_complete_stream, di);
+ aio_co_enter(qemu_get_aio_context(), co);
+}
+
+/*
+ * job_cancel(_sync) does not like to be called from coroutines, so defer to
+ * main loop processing via a bottom half.
+ */
+static void job_cancel_bh(void *opaque) {
+ CoCtxData *data = (CoCtxData*)opaque;
+ Job *job = (Job*)data->data;
+ job_cancel_sync(job, true);
+ aio_co_enter(data->ctx, data->co);
+}
+static void coroutine_fn pvebackup_co_cancel(void *opaque)
+{
Error *cancel_err = NULL;
error_setg(&cancel_err, "backup canceled");
pvebackup_propagate_error(cancel_err);
- qemu_mutex_lock(&backup_state.backup_mutex);
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
if (backup_state.vmaw) {
/* make sure vma writer does not block anymore */
@@ -342,28 +378,22 @@ static void pvebackup_cancel(void)
((PVEBackupDevInfo *)bdi->data)->job :
NULL;
- /* ref the job before releasing the mutex, just to be safe */
if (cancel_job) {
- WITH_JOB_LOCK_GUARD() {
- job_ref_locked(&cancel_job->job);
- }
+ CoCtxData data = {
+ .ctx = qemu_get_current_aio_context(),
+ .co = qemu_coroutine_self(),
+ .data = &cancel_job->job,
+ };
+ aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
+ qemu_coroutine_yield();
}
- /* job_cancel_sync may enter the job, so we need to release the
- * backup_mutex to avoid deadlock */
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- if (cancel_job) {
- WITH_JOB_LOCK_GUARD() {
- job_cancel_sync_locked(&cancel_job->job, true);
- job_unref_locked(&cancel_job->job);
- }
- }
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
void qmp_backup_cancel(Error **errp)
{
- pvebackup_cancel();
+ block_on_coroutine_fn(pvebackup_co_cancel, NULL);
}
// assumes the caller holds backup_mutex
@@ -416,10 +446,18 @@ static int coroutine_fn pvebackup_co_add_config(
goto out;
}
-static bool create_backup_jobs(void) {
+/*
+ * backup_job_create can *not* be run from a coroutine (and requires an
+ * acquired AioContext), so this can't either.
+ * The caller is responsible that backup_mutex is held nonetheless.
+ */
+static void create_backup_jobs_bh(void *opaque) {
assert(!qemu_in_coroutine());
+ CoCtxData *data = (CoCtxData*)opaque;
+ Error **errp = (Error**)data->data;
+
Error *local_err = NULL;
/* create job transaction to synchronize bitmap commit and cancel all
@@ -455,24 +493,19 @@ static bool create_backup_jobs(void) {
aio_context_release(aio_context);
- if (!job || local_err != NULL) {
- Error *create_job_err = NULL;
- error_setg(&create_job_err, "backup_job_create failed: %s",
- local_err ? error_get_pretty(local_err) : "null");
+ di->job = job;
- pvebackup_propagate_error(create_job_err);
+ if (!job || local_err) {
+ error_setg(errp, "backup_job_create failed: %s",
+ local_err ? error_get_pretty(local_err) : "null");
break;
}
- di->job = job;
-
bdrv_unref(di->target);
di->target = NULL;
}
- bool errors = pvebackup_error_or_canceled();
-
- if (errors) {
+ if (*errp) {
l = backup_state.di_list;
while (l) {
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
@@ -485,13 +518,15 @@ static bool create_backup_jobs(void) {
if (di->job) {
WITH_JOB_LOCK_GUARD() {
+ job_cancel_sync_locked(&di->job->job, true);
job_unref_locked(&di->job->job);
}
}
}
}
- return errors;
+ /* return */
+ aio_co_enter(data->ctx, data->co);
}
typedef struct QmpBackupTask {
@@ -528,11 +563,12 @@ typedef struct QmpBackupTask {
UuidInfo *result;
} QmpBackupTask;
-// assumes the caller holds backup_mutex
static void coroutine_fn pvebackup_co_prepare(void *opaque)
{
assert(qemu_in_coroutine());
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
+
QmpBackupTask *task = opaque;
task->result = NULL; // just to be sure
@@ -553,8 +589,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *firewall_name = "qemu-server.fw";
if (backup_state.di_list) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"previous backup not finished");
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
return;
}
@@ -621,6 +658,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
di->size = size;
total += size;
+
+ di->completed_ret = INT_MAX;
}
uuid_generate(uuid);
@@ -852,6 +891,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
backup_state.stat.dirty = total - backup_state.stat.reused;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
+ backup_state.stat.finishing = false;
+ backup_state.stat.starting = true;
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -866,6 +907,33 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
uuid_info->UUID = uuid_str;
task->result = uuid_info;
+
+ /* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
+ * backup_mutex locked. This is fine, a CoMutex can be held across yield
+ * points, and we'll release it as soon as the BH reschedules us.
+ */
+ CoCtxData waker = {
+ .co = qemu_coroutine_self(),
+ .ctx = qemu_get_current_aio_context(),
+ .data = &local_err,
+ };
+ aio_bh_schedule_oneshot(waker.ctx, create_backup_jobs_bh, &waker);
+ qemu_coroutine_yield();
+
+ if (local_err) {
+ error_propagate(task->errp, local_err);
+ goto err;
+ }
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.starting = false;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ /* start the first job in the transaction */
+ job_txn_start_seq(backup_state.txn);
+
return;
err_mutex:
@@ -888,6 +956,7 @@ err:
g_free(di);
}
g_list_free(di_list);
+ backup_state.di_list = NULL;
if (devs) {
g_strfreev(devs);
@@ -908,6 +977,8 @@ err:
}
task->result = NULL;
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
return;
}
@@ -961,24 +1032,8 @@ UuidInfo *qmp_backup(
.errp = errp,
};
- qemu_mutex_lock(&backup_state.backup_mutex);
-
block_on_coroutine_fn(pvebackup_co_prepare, &task);
- if (*errp == NULL) {
- bool errors = create_backup_jobs();
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- if (!errors) {
- /* start the first job in the transaction
- * note: this might directly enter the job, so we need to do this
- * after unlocking the backup_mutex */
- job_txn_start_seq(backup_state.txn);
- }
- } else {
- qemu_mutex_unlock(&backup_state.backup_mutex);
- }
-
return task.result;
}
@@ -1030,6 +1085,7 @@ BackupStatus *qmp_query_backup(Error **errp)
info->transferred = backup_state.stat.transferred;
info->has_reused = true;
info->reused = backup_state.stat.reused;
+ info->finishing = backup_state.stat.finishing;
qemu_mutex_unlock(&backup_state.stat.lock);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7fde927621..bf559c6d52 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -770,12 +770,15 @@
#
# @uuid: uuid for this backup job
#
+# @finishing: if status='active' and finishing=true, then the backup process is
+# waiting for the target to finish.
+#
##
{ 'struct': 'BackupStatus',
'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int', '*dirty': 'int',
'*transferred': 'int', '*zero-bytes': 'int', '*reused': 'int',
'*start-time': 'int', '*end-time': 'int',
- '*backup-file': 'str', '*uuid': 'str' } }
+ '*backup-file': 'str', '*uuid': 'str', 'finishing': 'bool' } }
##
# @BackupFormat:

View File

@ -7,15 +7,16 @@ This reverts commit fc176116cdea816ceb8dd969080b2b95f58edbc0 in
preparation to revert 0347a8fd4c3faaedf119be04c197804be40a384b.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/rbd.c | 42 ++----------------------------------------
1 file changed, 2 insertions(+), 40 deletions(-)
diff --git a/block/rbd.c b/block/rbd.c
index 64a8d7d48b..9fc6dcb957 100644
index 63f60d41be..367db42dce 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1348,7 +1348,6 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
@@ -1515,7 +1515,6 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
int status, r;
RBDDiffIterateReq req = { .offs = offset };
uint64_t features, flags;
@ -23,7 +24,7 @@ index 64a8d7d48b..9fc6dcb957 100644
assert(offset + bytes <= s->image_size);
@@ -1376,43 +1375,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
@@ -1543,43 +1542,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
return status;
}
@ -68,7 +69,7 @@ index 64a8d7d48b..9fc6dcb957 100644
qemu_rbd_diff_iterate_cb, &req);
if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
return status;
@@ -1431,8 +1394,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
@@ -1598,8 +1561,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
}

View File

@ -8,15 +8,16 @@ This reverts commit 9e302f64bb407a9bb097b626da97228c2654cfee in
preparation to revert 0347a8fd4c3faaedf119be04c197804be40a384b.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/rbd.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/block/rbd.c b/block/rbd.c
index 9fc6dcb957..98f4ba2620 100644
index 367db42dce..347b121626 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1307,11 +1307,11 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
@@ -1474,11 +1474,11 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
RBDDiffIterateReq *req = opaque;
assert(req->offs + req->bytes <= offs);

View File

@ -18,15 +18,16 @@ Upstream bug report:
https://gitlab.com/qemu-project/qemu/-/issues/1026
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/rbd.c | 112 ----------------------------------------------------
1 file changed, 112 deletions(-)
diff --git a/block/rbd.c b/block/rbd.c
index 98f4ba2620..efcbbe5949 100644
index 347b121626..e61b359b97 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -97,12 +97,6 @@ typedef struct RBDTask {
@@ -108,12 +108,6 @@ typedef struct RBDTask {
int64_t ret;
} RBDTask;
@ -39,7 +40,7 @@ index 98f4ba2620..efcbbe5949 100644
static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
BlockdevOptionsRbd *opts, bool cache,
const char *keypairs, const char *secretid,
@@ -1293,111 +1287,6 @@ static ImageInfoSpecific *qemu_rbd_get_specific_info(BlockDriverState *bs,
@@ -1460,111 +1454,6 @@ static ImageInfoSpecific *qemu_rbd_get_specific_info(BlockDriverState *bs,
return spec_info;
}
@ -148,10 +149,10 @@ index 98f4ba2620..efcbbe5949 100644
- return status;
-}
-
static int64_t qemu_rbd_getlength(BlockDriverState *bs)
static int64_t coroutine_fn qemu_rbd_co_getlength(BlockDriverState *bs)
{
BDRVRBDState *s = bs->opaque;
@@ -1633,7 +1522,6 @@ static BlockDriver bdrv_rbd = {
@@ -1800,7 +1689,6 @@ static BlockDriver bdrv_rbd = {
#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
.bdrv_co_pwrite_zeroes = qemu_rbd_co_pwrite_zeroes,
#endif

View File

@ -0,0 +1,43 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Tue, 26 Mar 2024 14:57:51 +0100
Subject: [PATCH] alloc-track: error out when auto-remove is not set
Since replacing the node now happens in the stream job, where the
option cannot be read from (it's internal to the driver), it will
always be treated as on.
qemu-server will always set it, make sure to have other users notice
the change (should they even exist). The option can be fully dropped
in the future while adding a version guard in qemu-server.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/alloc-track.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/block/alloc-track.c b/block/alloc-track.c
index b9f8ea9137..f3ed2935c4 100644
--- a/block/alloc-track.c
+++ b/block/alloc-track.c
@@ -34,7 +34,6 @@ typedef struct {
BdrvDirtyBitmap *bitmap;
uint64_t granularity;
DropState drop_state;
- bool auto_remove;
} BDRVAllocTrackState;
static QemuOptsList runtime_opts = {
@@ -86,7 +85,11 @@ static int track_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
- s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
+ if (!qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false)) {
+ error_setg(errp, "alloc-track: requires auto-remove option to be set to on");
+ ret = -EINVAL;
+ goto fail;
+ }
/* open the target (write) node, backing will be attached by block layer */
file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,

View File

@ -1,598 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Tue, 26 Jan 2021 15:45:30 +0100
Subject: [PATCH] PVE: Use coroutine QMP for backup/cancel_backup
Finally turn backup QMP calls into coroutines, now that it's possible.
This has the benefit that calls are asynchronous to the main loop, i.e.
long running operations like connecting to a PBS server will no longer
hang the VM.
Additionally, it allows us to get rid of block_on_coroutine_fn, which
was always a hacky workaround.
While we're already spring cleaning, also remove the QmpBackupTask
struct, since we can now put the 'prepare' function directly into
qmp_backup and thus no longer need those giant walls of text.
(Note that for our patches to work with 5.2.0 this change is actually
required, otherwise monitor_get_fd() fails as we're not in a QMP
coroutine, but one we start ourselves - we could of course set the
monitor for that coroutine ourselves, but let's just fix it the right
way instead)
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 4 +-
hmp-commands.hx | 2 +
proxmox-backup-client.c | 31 -----
pve-backup.c | 232 ++++++++++-----------------------
qapi/block-core.json | 4 +-
5 files changed, 77 insertions(+), 196 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index a09f722fea..71ed202491 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1016,7 +1016,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
g_free(global_snapshots);
}
-void hmp_backup_cancel(Monitor *mon, const QDict *qdict)
+void coroutine_fn hmp_backup_cancel(Monitor *mon, const QDict *qdict)
{
Error *error = NULL;
@@ -1025,7 +1025,7 @@ void hmp_backup_cancel(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, error);
}
-void hmp_backup(Monitor *mon, const QDict *qdict)
+void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
{
Error *error = NULL;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index fcf9461295..5fdb198ca4 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -111,6 +111,7 @@ ERST
"\n\t\t\t Use -d to dump data into a directory instead"
"\n\t\t\t of using VMA format.",
.cmd = hmp_backup,
+ .coroutine = true,
},
SRST
@@ -124,6 +125,7 @@ ERST
.params = "",
.help = "cancel the current VM backup",
.cmd = hmp_backup_cancel,
+ .coroutine = true,
},
SRST
diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
index 4ce7bc0b5e..0923037dec 100644
--- a/proxmox-backup-client.c
+++ b/proxmox-backup-client.c
@@ -5,37 +5,6 @@
/* Proxmox Backup Server client bindings using coroutines */
-typedef struct BlockOnCoroutineWrapper {
- AioContext *ctx;
- CoroutineEntry *entry;
- void *entry_arg;
- bool finished;
-} BlockOnCoroutineWrapper;
-
-static void coroutine_fn block_on_coroutine_wrapper(void *opaque)
-{
- BlockOnCoroutineWrapper *wrapper = opaque;
- wrapper->entry(wrapper->entry_arg);
- wrapper->finished = true;
- aio_wait_kick();
-}
-
-void block_on_coroutine_fn(CoroutineEntry *entry, void *entry_arg)
-{
- assert(!qemu_in_coroutine());
-
- AioContext *ctx = qemu_get_current_aio_context();
- BlockOnCoroutineWrapper wrapper = {
- .finished = false,
- .entry = entry,
- .entry_arg = entry_arg,
- .ctx = ctx,
- };
- Coroutine *wrapper_co = qemu_coroutine_create(block_on_coroutine_wrapper, &wrapper);
- aio_co_enter(ctx, wrapper_co);
- AIO_WAIT_WHILE(ctx, !wrapper.finished);
-}
-
// This is called from another thread, so we use aio_co_schedule()
static void proxmox_backup_schedule_wake(void *data) {
CoCtxData *waker = (CoCtxData *)data;
diff --git a/pve-backup.c b/pve-backup.c
index fa9c6c4493..109498eaf9 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -354,7 +354,7 @@ static void job_cancel_bh(void *opaque) {
aio_co_enter(data->ctx, data->co);
}
-static void coroutine_fn pvebackup_co_cancel(void *opaque)
+void coroutine_fn qmp_backup_cancel(Error **errp)
{
Error *cancel_err = NULL;
error_setg(&cancel_err, "backup canceled");
@@ -391,11 +391,6 @@ static void coroutine_fn pvebackup_co_cancel(void *opaque)
qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
-void qmp_backup_cancel(Error **errp)
-{
- block_on_coroutine_fn(pvebackup_co_cancel, NULL);
-}
-
// assumes the caller holds backup_mutex
static int coroutine_fn pvebackup_co_add_config(
const char *file,
@@ -529,50 +524,27 @@ static void create_backup_jobs_bh(void *opaque) {
aio_co_enter(data->ctx, data->co);
}
-typedef struct QmpBackupTask {
- const char *backup_file;
- bool has_password;
- const char *password;
- bool has_keyfile;
- const char *keyfile;
- bool has_key_password;
- const char *key_password;
- bool has_backup_id;
- const char *backup_id;
- bool has_backup_time;
- const char *fingerprint;
- bool has_fingerprint;
- int64_t backup_time;
- bool has_use_dirty_bitmap;
- bool use_dirty_bitmap;
- bool has_format;
- BackupFormat format;
- bool has_config_file;
- const char *config_file;
- bool has_firewall_file;
- const char *firewall_file;
- bool has_devlist;
- const char *devlist;
- bool has_compress;
- bool compress;
- bool has_encrypt;
- bool encrypt;
- bool has_speed;
- int64_t speed;
- Error **errp;
- UuidInfo *result;
-} QmpBackupTask;
-
-static void coroutine_fn pvebackup_co_prepare(void *opaque)
+UuidInfo coroutine_fn *qmp_backup(
+ const char *backup_file,
+ bool has_password, const char *password,
+ bool has_keyfile, const char *keyfile,
+ bool has_key_password, const char *key_password,
+ bool has_fingerprint, const char *fingerprint,
+ bool has_backup_id, const char *backup_id,
+ bool has_backup_time, int64_t backup_time,
+ bool has_use_dirty_bitmap, bool use_dirty_bitmap,
+ bool has_compress, bool compress,
+ bool has_encrypt, bool encrypt,
+ bool has_format, BackupFormat format,
+ bool has_config_file, const char *config_file,
+ bool has_firewall_file, const char *firewall_file,
+ bool has_devlist, const char *devlist,
+ bool has_speed, int64_t speed, Error **errp)
{
assert(qemu_in_coroutine());
qemu_co_mutex_lock(&backup_state.backup_mutex);
- QmpBackupTask *task = opaque;
-
- task->result = NULL; // just to be sure
-
BlockBackend *blk;
BlockDriverState *bs = NULL;
const char *backup_dir = NULL;
@@ -589,17 +561,17 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *firewall_name = "qemu-server.fw";
if (backup_state.di_list) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"previous backup not finished");
qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return;
+ return NULL;
}
/* Todo: try to auto-detect format based on file name */
- BackupFormat format = task->has_format ? task->format : BACKUP_FORMAT_VMA;
+ format = has_format ? format : BACKUP_FORMAT_VMA;
- if (task->has_devlist) {
- devs = g_strsplit_set(task->devlist, ",;:", -1);
+ if (has_devlist) {
+ devs = g_strsplit_set(devlist, ",;:", -1);
gchar **d = devs;
while (d && *d) {
@@ -607,14 +579,14 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (blk) {
bs = blk_bs(blk);
if (!bdrv_is_inserted(bs)) {
- error_setg(task->errp, QERR_DEVICE_HAS_NO_MEDIUM, *d);
+ error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, *d);
goto err;
}
PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
di->bs = bs;
di_list = g_list_append(di_list, di);
} else {
- error_set(task->errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
"Device '%s' not found", *d);
goto err;
}
@@ -637,7 +609,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
if (!di_list) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "empty device list");
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "empty device list");
goto err;
}
@@ -647,13 +619,13 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
while (l) {
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (bdrv_op_is_blocked(di->bs, BLOCK_OP_TYPE_BACKUP_SOURCE, task->errp)) {
+ if (bdrv_op_is_blocked(di->bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
goto err;
}
ssize_t size = bdrv_getlength(di->bs);
if (size < 0) {
- error_setg_errno(task->errp, -di->size, "bdrv_getlength failed");
+ error_setg_errno(errp, -di->size, "bdrv_getlength failed");
goto err;
}
di->size = size;
@@ -680,47 +652,44 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
if (format == BACKUP_FORMAT_PBS) {
- if (!task->has_password) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
+ if (!has_password) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
goto err_mutex;
}
- if (!task->has_backup_id) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
+ if (!has_backup_id) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
goto err_mutex;
}
- if (!task->has_backup_time) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
+ if (!has_backup_time) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
goto err_mutex;
}
int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
firewall_name = "fw.conf";
- bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
-
-
char *pbs_err = NULL;
pbs = proxmox_backup_new(
- task->backup_file,
- task->backup_id,
- task->backup_time,
+ backup_file,
+ backup_id,
+ backup_time,
dump_cb_block_size,
- task->has_password ? task->password : NULL,
- task->has_keyfile ? task->keyfile : NULL,
- task->has_key_password ? task->key_password : NULL,
- task->has_compress ? task->compress : true,
- task->has_encrypt ? task->encrypt : task->has_keyfile,
- task->has_fingerprint ? task->fingerprint : NULL,
+ has_password ? password : NULL,
+ has_keyfile ? keyfile : NULL,
+ has_key_password ? key_password : NULL,
+ has_compress ? compress : true,
+ has_encrypt ? encrypt : has_keyfile,
+ has_fingerprint ? fingerprint : NULL,
&pbs_err);
if (!pbs) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"proxmox_backup_new failed: %s", pbs_err);
proxmox_backup_free_error(pbs_err);
goto err_mutex;
}
- int connect_result = proxmox_backup_co_connect(pbs, task->errp);
+ int connect_result = proxmox_backup_co_connect(pbs, errp);
if (connect_result < 0)
goto err_mutex;
@@ -739,9 +708,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
bool expect_only_dirty = false;
- if (use_dirty_bitmap) {
+ if (has_use_dirty_bitmap && use_dirty_bitmap) {
if (bitmap == NULL) {
- bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
+ bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, errp);
if (!bitmap) {
goto err_mutex;
}
@@ -771,12 +740,12 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
}
- int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
+ int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, errp);
if (dev_id < 0) {
goto err_mutex;
}
- if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
+ if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, errp))) {
goto err_mutex;
}
@@ -790,10 +759,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
backup_state.stat.bitmap_list = g_list_append(backup_state.stat.bitmap_list, info);
}
} else if (format == BACKUP_FORMAT_VMA) {
- vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
+ vmaw = vma_writer_create(backup_file, uuid, &local_err);
if (!vmaw) {
if (local_err) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
}
goto err_mutex;
}
@@ -804,25 +773,25 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, task->errp))) {
+ if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, errp))) {
goto err_mutex;
}
const char *devname = bdrv_get_device_name(di->bs);
di->dev_id = vma_writer_register_stream(vmaw, devname, di->size);
if (di->dev_id <= 0) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"register_stream failed");
goto err_mutex;
}
}
} else if (format == BACKUP_FORMAT_DIR) {
- if (mkdir(task->backup_file, 0640) != 0) {
- error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
- task->backup_file);
+ if (mkdir(backup_file, 0640) != 0) {
+ error_setg_errno(errp, errno, "can't create directory '%s'\n",
+ backup_file);
goto err_mutex;
}
- backup_dir = task->backup_file;
+ backup_dir = backup_file;
l = di_list;
while (l) {
@@ -836,34 +805,34 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
bdrv_img_create(di->targetfile, "raw", NULL, NULL, NULL,
di->size, flags, false, &local_err);
if (local_err) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
goto err_mutex;
}
di->target = bdrv_open(di->targetfile, NULL, NULL, flags, &local_err);
if (!di->target) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
goto err_mutex;
}
}
} else {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
goto err_mutex;
}
/* add configuration file to archive */
- if (task->has_config_file) {
- if (pvebackup_co_add_config(task->config_file, config_name, format, backup_dir,
- vmaw, pbs, task->errp) != 0) {
+ if (has_config_file) {
+ if (pvebackup_co_add_config(config_file, config_name, format, backup_dir,
+ vmaw, pbs, errp) != 0) {
goto err_mutex;
}
}
/* add firewall file to archive */
- if (task->has_firewall_file) {
- if (pvebackup_co_add_config(task->firewall_file, firewall_name, format, backup_dir,
- vmaw, pbs, task->errp) != 0) {
+ if (has_firewall_file) {
+ if (pvebackup_co_add_config(firewall_file, firewall_name, format, backup_dir,
+ vmaw, pbs, errp) != 0) {
goto err_mutex;
}
}
@@ -881,7 +850,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (backup_state.stat.backup_file) {
g_free(backup_state.stat.backup_file);
}
- backup_state.stat.backup_file = g_strdup(task->backup_file);
+ backup_state.stat.backup_file = g_strdup(backup_file);
uuid_copy(backup_state.stat.uuid, uuid);
uuid_unparse_lower(uuid, backup_state.stat.uuid_str);
@@ -896,7 +865,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
qemu_mutex_unlock(&backup_state.stat.lock);
- backup_state.speed = (task->has_speed && task->speed > 0) ? task->speed : 0;
+ backup_state.speed = (has_speed && speed > 0) ? speed : 0;
backup_state.vmaw = vmaw;
backup_state.pbs = pbs;
@@ -906,8 +875,6 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
uuid_info = g_malloc0(sizeof(*uuid_info));
uuid_info->UUID = uuid_str;
- task->result = uuid_info;
-
/* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
* backup_mutex locked. This is fine, a CoMutex can be held across yield
* points, and we'll release it as soon as the BH reschedules us.
@@ -921,7 +888,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
qemu_coroutine_yield();
if (local_err) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
goto err;
}
@@ -934,7 +901,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
/* start the first job in the transaction */
job_txn_start_seq(backup_state.txn);
- return;
+ return uuid_info;
err_mutex:
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -965,7 +932,7 @@ err:
if (vmaw) {
Error *err = NULL;
vma_writer_close(vmaw, &err);
- unlink(task->backup_file);
+ unlink(backup_file);
}
if (pbs) {
@@ -976,65 +943,8 @@ err:
rmdir(backup_dir);
}
- task->result = NULL;
-
qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return;
-}
-
-UuidInfo *qmp_backup(
- const char *backup_file,
- bool has_password, const char *password,
- bool has_keyfile, const char *keyfile,
- bool has_key_password, const char *key_password,
- bool has_fingerprint, const char *fingerprint,
- bool has_backup_id, const char *backup_id,
- bool has_backup_time, int64_t backup_time,
- bool has_use_dirty_bitmap, bool use_dirty_bitmap,
- bool has_compress, bool compress,
- bool has_encrypt, bool encrypt,
- bool has_format, BackupFormat format,
- bool has_config_file, const char *config_file,
- bool has_firewall_file, const char *firewall_file,
- bool has_devlist, const char *devlist,
- bool has_speed, int64_t speed, Error **errp)
-{
- QmpBackupTask task = {
- .backup_file = backup_file,
- .has_password = has_password,
- .password = password,
- .has_keyfile = has_keyfile,
- .keyfile = keyfile,
- .has_key_password = has_key_password,
- .key_password = key_password,
- .has_fingerprint = has_fingerprint,
- .fingerprint = fingerprint,
- .has_backup_id = has_backup_id,
- .backup_id = backup_id,
- .has_backup_time = has_backup_time,
- .backup_time = backup_time,
- .has_use_dirty_bitmap = has_use_dirty_bitmap,
- .use_dirty_bitmap = use_dirty_bitmap,
- .has_compress = has_compress,
- .compress = compress,
- .has_encrypt = has_encrypt,
- .encrypt = encrypt,
- .has_format = has_format,
- .format = format,
- .has_config_file = has_config_file,
- .config_file = config_file,
- .has_firewall_file = has_firewall_file,
- .firewall_file = firewall_file,
- .has_devlist = has_devlist,
- .devlist = devlist,
- .has_speed = has_speed,
- .speed = speed,
- .errp = errp,
- };
-
- block_on_coroutine_fn(pvebackup_co_prepare, &task);
-
- return task.result;
+ return NULL;
}
BackupStatus *qmp_query_backup(Error **errp)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 24f30260c8..4e8c35a3a2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -842,7 +842,7 @@
'*config-file': 'str',
'*firewall-file': 'str',
'*devlist': 'str', '*speed': 'int' },
- 'returns': 'UuidInfo' }
+ 'returns': 'UuidInfo', 'coroutine': true }
##
# @query-backup:
@@ -864,7 +864,7 @@
# Notes: This command succeeds even if there is no backup process running.
#
##
-{ 'command': 'backup-cancel' }
+{ 'command': 'backup-cancel', 'coroutine': true }
##
# @ProxmoxSupportStatus:

View File

@ -0,0 +1,84 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 27 Mar 2024 11:15:39 +0100
Subject: [PATCH] alloc-track: avoid seemingly superfluous child permission
update
Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix
deadlock during drop" where the dropping is not rescheduled and delayed
anymore or some upstream change). Should there really be some issue,
instead of having a drop state, this could also be just based off the
fact whether there is still a backing child.
Dumping the cumulative (shared) permissions for the BDS with a debug
print yields the same values after this patch and with QEMU 8.1,
namely 3 and 5.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/alloc-track.c | 26 --------------------------
1 file changed, 26 deletions(-)
diff --git a/block/alloc-track.c b/block/alloc-track.c
index f3ed2935c4..29138dcc49 100644
--- a/block/alloc-track.c
+++ b/block/alloc-track.c
@@ -25,15 +25,9 @@
#define TRACK_OPT_AUTO_REMOVE "auto-remove"
-typedef enum DropState {
- DropNone,
- DropInProgress,
-} DropState;
-
typedef struct {
BdrvDirtyBitmap *bitmap;
uint64_t granularity;
- DropState drop_state;
} BDRVAllocTrackState;
static QemuOptsList runtime_opts = {
@@ -137,8 +131,6 @@ static int track_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
- s->drop_state = DropNone;
-
fail:
if (ret < 0) {
bdrv_graph_wrlock();
@@ -289,18 +281,8 @@ track_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
BlockReopenQueue *reopen_queue, uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
- BDRVAllocTrackState *s = bs->opaque;
-
*nshared = BLK_PERM_ALL;
- /* in case we're currently dropping ourselves, claim to not use any
- * permissions at all - which is fine, since from this point on we will
- * never issue a read or write anymore */
- if (s->drop_state == DropInProgress) {
- *nperm = 0;
- return;
- }
-
if (role & BDRV_CHILD_DATA) {
*nperm = perm & DEFAULT_PERM_PASSTHROUGH;
} else {
@@ -326,14 +308,6 @@ track_co_change_backing_file(BlockDriverState *bs, const char *backing_file,
* kinda fits better, but in the long-term, a special parameter would be
* nice (or done via qemu-server via upcoming blockdev-replace QMP command).
*/
- if (backing_file == NULL) {
- BDRVAllocTrackState *s = bs->opaque;
- bdrv_drained_begin(bs);
- s->drop_state = DropInProgress;
- bdrv_child_refresh_perms(bs, bs->file, &error_abort);
- bdrv_drained_end(bs);
- }
-
return 0;
}

View File

@ -1,98 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 10 Feb 2021 11:07:06 +0100
Subject: [PATCH] PBS: add master key support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
this requires a new enough libproxmox-backup-qemu0, and allows querying
from the PVE side to avoid QMP calls with unsupported parameters.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
pve-backup.c | 3 +++
qapi/block-core.json | 7 +++++++
3 files changed, 11 insertions(+)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 71ed202491..c7468e5d3b 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1039,6 +1039,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS password
false, NULL, // PBS keyfile
false, NULL, // PBS key_password
+ false, NULL, // PBS master_keyfile
false, NULL, // PBS fingerprint
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
diff --git a/pve-backup.c b/pve-backup.c
index 109498eaf9..4b5134ed27 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -529,6 +529,7 @@ UuidInfo coroutine_fn *qmp_backup(
bool has_password, const char *password,
bool has_keyfile, const char *keyfile,
bool has_key_password, const char *key_password,
+ bool has_master_keyfile, const char *master_keyfile,
bool has_fingerprint, const char *fingerprint,
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
@@ -677,6 +678,7 @@ UuidInfo coroutine_fn *qmp_backup(
has_password ? password : NULL,
has_keyfile ? keyfile : NULL,
has_key_password ? key_password : NULL,
+ has_master_keyfile ? master_keyfile : NULL,
has_compress ? compress : true,
has_encrypt ? encrypt : has_keyfile,
has_fingerprint ? fingerprint : NULL,
@@ -1040,5 +1042,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->pbs_dirty_bitmap_savevm = true;
ret->pbs_dirty_bitmap_migration = true;
ret->query_bitmap_info = true;
+ ret->pbs_masterkey = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 4e8c35a3a2..d8c7331090 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -813,6 +813,8 @@
#
# @key-password: password for keyfile (optional for format 'pbs')
#
+# @master-keyfile: PEM-formatted master public keyfile (optional for format 'pbs')
+#
# @fingerprint: server cert fingerprint (optional for format 'pbs')
#
# @backup-id: backup ID (required for format 'pbs')
@@ -832,6 +834,7 @@
'*password': 'str',
'*keyfile': 'str',
'*key-password': 'str',
+ '*master-keyfile': 'str',
'*fingerprint': 'str',
'*backup-id': 'str',
'*backup-time': 'int',
@@ -884,6 +887,9 @@
# migration cap if this is false/unset may lead
# to crashes on migration!
#
+# @pbs-masterkey: True if the QMP backup call supports the 'master_keyfile'
+# parameter.
+#
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
#
##
@@ -892,6 +898,7 @@
'query-bitmap-info': 'bool',
'pbs-dirty-bitmap-savevm': 'bool',
'pbs-dirty-bitmap-migration': 'bool',
+ 'pbs-masterkey': 'bool',
'pbs-library-version': 'str' } }
##

View File

@ -0,0 +1,55 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Date: Thu, 11 Apr 2024 11:29:22 +0200
Subject: [PATCH] block/copy-before-write: fix permission
In case when source node does not have any parents, the condition still
works as required: backup job do create the parent by
block_job_create -> block_job_add_bdrv -> bdrv_root_attach_child
Still, in this case checking @perm variable doesn't work, as backup job
creates the root blk with empty permissions (as it rely on CBW filter
to require correct permissions and don't want to create extra
conflicts).
So, we should not check @perm.
The hack may be dropped entirely when transactional insertion of
filter (when we don't try to recalculate permissions in intermediate
state, when filter does conflict with original parent of the source
node) merged (old big series
"[PATCH v5 00/45] Transactional block-graph modifying API"[1] and it's
current in-flight part is "[PATCH v8 0/7] blockdev-replace"[2])
[1] https://patchew.org/QEMU/20220330212902.590099-1-vsementsov@openvz.org/
[2] https://patchew.org/QEMU/20231017184444.932733-1-vsementsov@yandex-team.ru/
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/copy-before-write.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 026fa9840f..5a9456d426 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -364,9 +364,13 @@ cbw_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
perm, shared, nperm, nshared);
if (!QLIST_EMPTY(&bs->parents)) {
- if (perm & BLK_PERM_WRITE) {
- *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
- }
+ /*
+ * Note, that source child may be shared with backup job. Backup job
+ * does create own blk parent on copy-before-write node, so this
+ * works even if source node does not have any parents before backup
+ * start
+ */
+ *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
}
}

View File

@ -1,53 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 9 Dec 2020 11:46:57 +0100
Subject: [PATCH] PVE: block/pbs: fast-path reads without allocation if
possible
...and switch over to g_malloc/g_free while at it to align with other
QEMU code.
Tracing shows the fast-path is taken almost all the time, though not
100% so the slow one is still necessary.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/pbs.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/block/pbs.c b/block/pbs.c
index 9d1f1f39d4..ce9a870885 100644
--- a/block/pbs.c
+++ b/block/pbs.c
@@ -200,7 +200,16 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
BDRVPBSState *s = bs->opaque;
int ret;
char *pbs_error = NULL;
- uint8_t *buf = malloc(bytes);
+ uint8_t *buf;
+ bool inline_buf = true;
+
+ /* for single-buffer IO vectors we can fast-path the write directly to it */
+ if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
+ buf = qiov->iov->iov_base;
+ } else {
+ inline_buf = false;
+ buf = g_malloc(bytes);
+ }
if (offset < 0 || bytes < 0) {
fprintf(stderr, "unexpected negative 'offset' or 'bytes' value!\n");
@@ -223,8 +232,10 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
return -EIO;
}
- qemu_iovec_from_buf(qiov, 0, buf, bytes);
- free(buf);
+ if (!inline_buf) {
+ qemu_iovec_from_buf(qiov, 0, buf, bytes);
+ g_free(buf);
+ }
return 0;
}

View File

@ -0,0 +1,48 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Date: Thu, 11 Apr 2024 11:29:23 +0200
Subject: [PATCH] block/copy-before-write: support unligned snapshot-discard
First thing that crashes on unligned access here is
bdrv_reset_dirty_bitmap(). Correct way is to align-down the
snapshot-discard request.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/copy-before-write.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 5a9456d426..c0e70669a2 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -325,14 +325,24 @@ static int coroutine_fn GRAPH_RDLOCK
cbw_co_pdiscard_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes)
{
BDRVCopyBeforeWriteState *s = bs->opaque;
+ uint32_t cluster_size = block_copy_cluster_size(s->bcs);
+ int64_t aligned_offset = QEMU_ALIGN_UP(offset, cluster_size);
+ int64_t aligned_end = QEMU_ALIGN_DOWN(offset + bytes, cluster_size);
+ int64_t aligned_bytes;
+
+ if (aligned_end <= aligned_offset) {
+ return 0;
+ }
+ aligned_bytes = aligned_end - aligned_offset;
WITH_QEMU_LOCK_GUARD(&s->lock) {
- bdrv_reset_dirty_bitmap(s->access_bitmap, offset, bytes);
+ bdrv_reset_dirty_bitmap(s->access_bitmap, aligned_offset,
+ aligned_bytes);
}
- block_copy_reset(s->bcs, offset, bytes);
+ block_copy_reset(s->bcs, aligned_offset, aligned_bytes);
- return bdrv_co_pdiscard(s->target, offset, bytes);
+ return bdrv_co_pdiscard(s->target, aligned_offset, aligned_bytes);
}
static void GRAPH_RDLOCK cbw_refresh_filename(BlockDriverState *bs)

View File

@ -0,0 +1,373 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Date: Thu, 11 Apr 2024 11:29:24 +0200
Subject: [PATCH] block/copy-before-write: create block_copy bitmap in filter
node
Currently block_copy creates copy_bitmap in source node. But that is in
bad relation with .independent_close=true of copy-before-write filter:
source node may be detached and removed before .bdrv_close() handler
called, which should call block_copy_state_free(), which in turn should
remove copy_bitmap.
That's all not ideal: it would be better if internal bitmap of
block-copy object is not attached to any node. But that is not possible
now.
The simplest solution is just create copy_bitmap in filter node, where
anyway two other bitmaps are created.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/block-copy.c | 3 +-
block/copy-before-write.c | 2 +-
include/block/block-copy.h | 1 +
tests/qemu-iotests/257.out | 112 ++++++++++++++++++-------------------
4 files changed, 60 insertions(+), 58 deletions(-)
diff --git a/block/block-copy.c b/block/block-copy.c
index 9ee3dd7ef5..8fca2c3698 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -351,6 +351,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
}
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
Error **errp)
{
@@ -367,7 +368,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
return NULL;
}
- copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
+ copy_bitmap = bdrv_create_dirty_bitmap(copy_bitmap_bs, cluster_size, NULL,
errp);
if (!copy_bitmap) {
return NULL;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index c0e70669a2..94db31512d 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -468,7 +468,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
bs->file->bs->supported_zero_flags);
- s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
+ s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp);
if (!s->bcs) {
error_prepend(errp, "Cannot create block-copy-state: ");
return -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 0700953ab8..8b41643bfa 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,6 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
typedef struct BlockCopyCallState BlockCopyCallState;
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
Error **errp);
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index aa76131ca9..c33dd7f3a9 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -120,16 +120,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -596,16 +596,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -865,16 +865,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -1341,16 +1341,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -1610,16 +1610,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -2086,16 +2086,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -2355,16 +2355,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -2831,16 +2831,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -3100,16 +3100,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -3576,16 +3576,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -3845,16 +3845,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -4321,16 +4321,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -4590,16 +4590,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,
@@ -5066,16 +5066,16 @@ write -P0x67 0x3fe0000 0x20000
"granularity": 65536,
"persistent": false,
"recording": false
- }
- ],
- "drive0": [
+ },
{
"busy": false,
"count": 0,
"granularity": 65536,
"persistent": false,
"recording": false
- },
+ }
+ ],
+ "drive0": [
{
"busy": false,
"count": 458752,

View File

@ -1,33 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Tue, 2 Mar 2021 16:11:54 +0100
Subject: [PATCH] block/io: accept NULL qiov in bdrv_pad_request
Some operations, e.g. block-stream, perform reads while discarding the
results (only copy-on-read matters). In this case they will pass NULL as
the target QEMUIOVector, which will however trip bdrv_pad_request, since
it wants to extend its passed vector.
Simply check for NULL and do nothing, there's no reason to pad the
target if it will be discarded anyway.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/io.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/io.c b/block/io.c
index b9424024f9..01f50d28c8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1730,6 +1730,10 @@ static int bdrv_pad_request(BlockDriverState *bs,
{
int ret;
+ if (!qiov) {
+ return 0;
+ }
+
bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, &error_abort);
if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {

View File

@ -0,0 +1,277 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Date: Thu, 11 Apr 2024 11:29:25 +0200
Subject: [PATCH] qapi: blockdev-backup: add discard-source parameter
Add a parameter that enables discard-after-copy. That is mostly useful
in "push backup with fleecing" scheme, when source is snapshot-access
format driver node, based on copy-before-write filter snapshot-access
API:
[guest] [snapshot-access] ~~ blockdev-backup ~~> [backup target]
| |
| root | file
v v
[copy-before-write]
| |
| file | target
v v
[active disk] [temp.img]
In this case discard-after-copy does two things:
- discard data in temp.img to save disk space
- avoid further copy-before-write operation in discarded area
Note that we have to declare WRITE permission on source in
copy-before-write filter, for discard to work. Still we can't take it
unconditionally, as it will break normal backup from RO source. So, we
have to add a parameter and pass it thorough bdrv_open flags.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/backup.c | 5 +++--
block/block-copy.c | 9 +++++++++
block/copy-before-write.c | 15 +++++++++++++--
block/copy-before-write.h | 1 +
block/replication.c | 4 ++--
blockdev.c | 2 +-
include/block/block-common.h | 2 ++
include/block/block-copy.h | 1 +
include/block/block_int-global-state.h | 2 +-
qapi/block-core.json | 4 ++++
10 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/block/backup.c b/block/backup.c
index 16d611c4ca..1963e47ab9 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -332,7 +332,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, int64_t speed,
MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
BitmapSyncMode bitmap_mode,
- bool compress,
+ bool compress, bool discard_source,
const char *filter_node_name,
BackupPerf *perf,
BlockdevOnError on_source_error,
@@ -433,7 +433,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
goto error;
}
- cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
+ cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
+ &bcs, errp);
if (!cbw) {
goto error;
}
diff --git a/block/block-copy.c b/block/block-copy.c
index 8fca2c3698..7e3b378528 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -137,6 +137,7 @@ typedef struct BlockCopyState {
CoMutex lock;
int64_t in_flight_bytes;
BlockCopyMethod method;
+ bool discard_source;
BlockReqList reqs;
QLIST_HEAD(, BlockCopyCallState) calls;
/*
@@ -353,6 +354,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
+ bool discard_source,
Error **errp)
{
ERRP_GUARD();
@@ -418,6 +420,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
cluster_size),
};
+ s->discard_source = discard_source;
block_copy_set_copy_opts(s, false, false);
ratelimit_init(&s->rate_limit);
@@ -589,6 +592,12 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
co_put_to_shres(s->mem, t->req.bytes);
block_copy_task_end(t, ret);
+ if (s->discard_source && ret == 0) {
+ int64_t nbytes =
+ MIN(t->req.offset + t->req.bytes, s->len) - t->req.offset;
+ bdrv_co_pdiscard(s->source, t->req.offset, nbytes);
+ }
+
return ret;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 94db31512d..853e01a1eb 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -44,6 +44,7 @@ typedef struct BDRVCopyBeforeWriteState {
BdrvChild *target;
OnCbwError on_cbw_error;
uint64_t cbw_timeout_ns;
+ bool discard_source;
/*
* @lock: protects access to @access_bitmap, @done_bitmap and
@@ -357,6 +358,8 @@ cbw_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
+ BDRVCopyBeforeWriteState *s = bs->opaque;
+
if (!(role & BDRV_CHILD_FILTERED)) {
/*
* Target child
@@ -381,6 +384,10 @@ cbw_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
* start
*/
*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+ if (s->discard_source) {
+ *nperm = *nperm | BLK_PERM_WRITE;
+ }
+
*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
}
}
@@ -468,7 +475,9 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
bs->file->bs->supported_zero_flags);
- s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp);
+ s->discard_source = flags & BDRV_O_CBW_DISCARD_SOURCE;
+ s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap,
+ flags & BDRV_O_CBW_DISCARD_SOURCE, errp);
if (!s->bcs) {
error_prepend(errp, "Cannot create block-copy-state: ");
return -EINVAL;
@@ -535,12 +544,14 @@ static BlockDriver bdrv_cbw_filter = {
BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
+ bool discard_source,
BlockCopyState **bcs,
Error **errp)
{
BDRVCopyBeforeWriteState *state;
BlockDriverState *top;
QDict *opts;
+ int flags = BDRV_O_RDWR | (discard_source ? BDRV_O_CBW_DISCARD_SOURCE : 0);
assert(source->total_sectors == target->total_sectors);
GLOBAL_STATE_CODE();
@@ -553,7 +564,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
qdict_put_str(opts, "file", bdrv_get_node_name(source));
qdict_put_str(opts, "target", bdrv_get_node_name(target));
- top = bdrv_insert_node(source, opts, BDRV_O_RDWR, errp);
+ top = bdrv_insert_node(source, opts, flags, errp);
if (!top) {
return NULL;
}
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 6e72bb25e9..01af0cd3c4 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -39,6 +39,7 @@
BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
+ bool discard_source,
BlockCopyState **bcs,
Error **errp);
void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/block/replication.c b/block/replication.c
index ca6bd0a720..0415a5e8b7 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -582,8 +582,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
s->backup_job = backup_job_create(
NULL, s->secondary_disk->bs, s->hidden_disk->bs,
- 0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, NULL,
- &perf,
+ 0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, false,
+ NULL, &perf,
BLOCKDEV_ON_ERROR_REPORT,
BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
backup_job_completed, bs, NULL, &local_err);
diff --git a/blockdev.c b/blockdev.c
index 5e5dbc1da9..1054a69279 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2727,7 +2727,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
backup->sync, bmap, backup->bitmap_mode,
- backup->compress,
+ backup->compress, backup->discard_source,
backup->filter_node_name,
&perf,
backup->on_source_error,
diff --git a/include/block/block-common.h b/include/block/block-common.h
index a846023a09..338fe5ff7a 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -243,6 +243,8 @@ typedef enum {
read-write fails */
#define BDRV_O_IO_URING 0x40000 /* use io_uring instead of the thread pool */
+#define BDRV_O_CBW_DISCARD_SOURCE 0x80000 /* for copy-before-write filter */
+
#define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 8b41643bfa..bdc703bacd 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -27,6 +27,7 @@ typedef struct BlockCopyCallState BlockCopyCallState;
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
+ bool discard_source,
Error **errp);
/* Function should be called prior any actual copy request */
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index cc1387ae02..f0c642b194 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -195,7 +195,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
MirrorSyncMode sync_mode,
BdrvDirtyBitmap *sync_bitmap,
BitmapSyncMode bitmap_mode,
- bool compress,
+ bool compress, bool discard_source,
const char *filter_node_name,
BackupPerf *perf,
BlockdevOnError on_source_error,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index f516d8e95a..d796d49abb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1849,6 +1849,9 @@
# node specified by @drive. If this option is not given, a node
# name is autogenerated. (Since: 4.2)
#
+# @discard-source: Discard blocks on source which are already copied
+# to the target. (Since 9.0)
+#
# @x-perf: Performance options. (Since 6.0)
#
# Features:
@@ -1870,6 +1873,7 @@
'*on-target-error': 'BlockdevOnError',
'*auto-finalize': 'bool', '*auto-dismiss': 'bool',
'*filter-node-name': 'str',
+ '*discard-source': 'bool',
'*x-perf': { 'type': 'BackupPerf',
'features': [ 'unstable' ] } } }

View File

@ -0,0 +1,133 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 11 Apr 2024 11:29:26 +0200
Subject: [PATCH] copy-before-write: allow specifying minimum cluster size
Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.
Copy-before-write operations will use at least this granularity and in
particular, discard requests to the source node will too. If the
granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.
The QAPI uses uint32 so the value will be non-negative, but still fit
into a uint64_t.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/block-copy.c | 17 +++++++++++++----
block/copy-before-write.c | 3 ++-
include/block/block-copy.h | 1 +
qapi/block-core.json | 8 +++++++-
4 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/block/block-copy.c b/block/block-copy.c
index 7e3b378528..adb1cbb440 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -310,6 +310,7 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
}
static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
+ int64_t min_cluster_size,
Error **errp)
{
int ret;
@@ -335,7 +336,7 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
"used. If the actual block size of the target exceeds "
"this default, the backup may be unusable",
BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
- return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+ return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
} else if (ret < 0 && !target_does_cow) {
error_setg_errno(errp, -ret,
"Couldn't determine the cluster size of the target image, "
@@ -345,16 +346,18 @@ static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
return ret;
} else if (ret < 0 && target_does_cow) {
/* Not fatal; just trudge on ahead. */
- return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+ return MAX(min_cluster_size, BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
}
- return MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
+ return MAX(min_cluster_size,
+ MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size));
}
BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
bool discard_source,
+ int64_t min_cluster_size,
Error **errp)
{
ERRP_GUARD();
@@ -365,7 +368,13 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
GLOBAL_STATE_CODE();
- cluster_size = block_copy_calculate_cluster_size(target->bs, errp);
+ if (min_cluster_size && !is_power_of_2(min_cluster_size)) {
+ error_setg(errp, "min-cluster-size needs to be a power of 2");
+ return NULL;
+ }
+
+ cluster_size = block_copy_calculate_cluster_size(target->bs,
+ min_cluster_size, errp);
if (cluster_size < 0) {
return NULL;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 853e01a1eb..47b3cdd09f 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -477,7 +477,8 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
s->discard_source = flags & BDRV_O_CBW_DISCARD_SOURCE;
s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap,
- flags & BDRV_O_CBW_DISCARD_SOURCE, errp);
+ flags & BDRV_O_CBW_DISCARD_SOURCE,
+ opts->min_cluster_size, errp);
if (!s->bcs) {
error_prepend(errp, "Cannot create block-copy-state: ");
return -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index bdc703bacd..77857c6c68 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -28,6 +28,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
BlockDriverState *copy_bitmap_bs,
const BdrvDirtyBitmap *bitmap,
bool discard_source,
+ int64_t min_cluster_size,
Error **errp);
/* Function should be called prior any actual copy request */
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d796d49abb..edbf6e78b9 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4930,12 +4930,18 @@
# @on-cbw-error parameter will decide how this failure is handled.
# Default 0. (Since 7.1)
#
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+# operations. Has to be a power of 2. No effect if smaller than
+# the maximum of the target's cluster size and 64 KiB. Default 0.
+# (Since 8.1)
+#
# Since: 6.2
##
{ 'struct': 'BlockdevOptionsCbw',
'base': 'BlockdevOptionsGenericFormat',
'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap',
- '*on-cbw-error': 'OnCbwError', '*cbw-timeout': 'uint32' } }
+ '*on-cbw-error': 'OnCbwError', '*cbw-timeout': 'uint32',
+ '*min-cluster-size': 'uint32' } }
##
# @BlockdevOptions:

View File

@ -1,35 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 26 May 2021 17:36:55 +0200
Subject: [PATCH] PVE: savevm-async: register yank before
migration_incoming_state_destroy
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
migration/savevm-async.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index bafe6ae5eb..da3634048f 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -20,6 +20,7 @@
#include "qemu/timer.h"
#include "qemu/main-loop.h"
#include "qemu/rcu.h"
+#include "qemu/yank.h"
/* #define DEBUG_SAVEVM_STATE */
@@ -514,6 +515,10 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
dirty_bitmap_mig_before_vm_start();
qemu_fclose(f);
+
+ /* state_destroy assumes a real migration which would have added a yank */
+ yank_register_instance(MIGRATION_YANK_INSTANCE, &error_abort);
+
migration_incoming_state_destroy();
if (ret < 0) {
error_setg_errno(errp, -ret, "Error while loading VM state");

View File

@ -0,0 +1,106 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 11 Apr 2024 11:29:27 +0200
Subject: [PATCH] backup: add minimum cluster size to performance options
Useful to make discard-source work in the context of backup fleecing
when the fleecing image has a larger granularity than the backup
target.
Backup/block-copy will use at least this granularity for copy operations
and in particular, discard requests to the backup source will too. If
the granularity is too small, they will just be aligned down in
cbw_co_pdiscard_snapshot() and thus effectively ignored.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/backup.c | 2 +-
block/copy-before-write.c | 2 ++
block/copy-before-write.h | 1 +
blockdev.c | 3 +++
qapi/block-core.json | 9 +++++++--
5 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/block/backup.c b/block/backup.c
index 1963e47ab9..fe69723ada 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -434,7 +434,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
}
cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
- &bcs, errp);
+ perf->min_cluster_size, &bcs, errp);
if (!cbw) {
goto error;
}
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 47b3cdd09f..bba58326d7 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -546,6 +546,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+ int64_t min_cluster_size,
BlockCopyState **bcs,
Error **errp)
{
@@ -564,6 +565,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
}
qdict_put_str(opts, "file", bdrv_get_node_name(source));
qdict_put_str(opts, "target", bdrv_get_node_name(target));
+ qdict_put_int(opts, "min-cluster-size", min_cluster_size);
top = bdrv_insert_node(source, opts, flags, errp);
if (!top) {
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 01af0cd3c4..dc6cafe7fa 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -40,6 +40,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockDriverState *target,
const char *filter_node_name,
bool discard_source,
+ int64_t min_cluster_size,
BlockCopyState **bcs,
Error **errp);
void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 1054a69279..cbe224387b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2654,6 +2654,9 @@ static BlockJob *do_backup_common(BackupCommon *backup,
if (backup->x_perf->has_max_chunk) {
perf.max_chunk = backup->x_perf->max_chunk;
}
+ if (backup->x_perf->has_min_cluster_size) {
+ perf.min_cluster_size = backup->x_perf->min_cluster_size;
+ }
}
if ((backup->sync == MIRROR_SYNC_MODE_BITMAP) ||
diff --git a/qapi/block-core.json b/qapi/block-core.json
index edbf6e78b9..6e7ee87633 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1790,11 +1790,16 @@
# it should not be less than job cluster size which is calculated
# as maximum of target image cluster size and 64k. Default 0.
#
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+# and background copy operations. Has to be a power of 2. No
+# effect if smaller than the maximum of the target's cluster size
+# and 64 KiB. Default 0. (Since 8.1)
+#
# Since: 6.0
##
{ 'struct': 'BackupPerf',
- 'data': { '*use-copy-range': 'bool',
- '*max-workers': 'int', '*max-chunk': 'int64' } }
+ 'data': { '*use-copy-range': 'bool', '*max-workers': 'int',
+ '*max-chunk': 'int64', '*min-cluster-size': 'uint32' } }
##
# @BackupCommon:

View File

@ -0,0 +1,345 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 11 Apr 2024 11:29:28 +0200
Subject: [PATCH] PVE backup: add fleecing option
When a fleecing option is given, it is expected that each device has
a corresponding "-fleecing" block device already attached, except for
EFI disk and TPM state, where fleecing is never used.
The following graph was adapted from [0] which also contains more
details about fleecing.
[guest]
|
| root
v file
[copy-before-write]<------[snapshot-access]
| |
| file | target
v v
[source] [fleecing]
For fleecing, a copy-before-write filter is inserted on top of the
source node, as well as a snapshot-access node pointing to the filter
node which allows to read the consistent state of the image at the
time it was inserted. New guest writes are passed through the
copy-before-write filter which will first copy over old data to the
fleecing image in case that old data is still needed by the
snapshot-access node.
The backup process will sequentially read from the snapshot access,
which has a bitmap and knows whether to read from the original image
or the fleecing image to get the "snapshot" state, i.e. data from the
source image at the time when the copy-before-write filter was
inserted. After reading, the copied sections are discarded from the
fleecing image to reduce space usage.
All of this can be restricted by an initial dirty bitmap to parts of
the source image that are required for an incremental backup.
For discard to work, it is necessary that the fleecing image does not
have a larger cluster size than the backup job granularity. Since
querying that size does not always work, e.g. for RBD with krbd, the
cluster size will not be reported, a minimum of 4 MiB is used. A job
with PBS target already has at least this granularity, so it's just
relevant for other targets. I.e. edge cases where this minimum is not
enough should be very rare in practice. If ever necessary in the
future, can still add a passed-in value for the backup QMP command to
override.
Additionally, the cbw-timeout and on-cbw-error=break-snapshot options
are set when installing the copy-before-write filter and
snapshot-access. When an error or timeout occurs, the problematic (and
each further) snapshot operation will fail and thus cancel the backup
instead of breaking the guest write.
Note that job_id cannot be inferred from the snapshot-access bs because
it has no parent, so just pass the one from the original bs.
[0]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg876056.html
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
pve-backup.c | 143 ++++++++++++++++++++++++++++++++-
qapi/block-core.json | 10 ++-
3 files changed, 150 insertions(+), 4 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 5000c084c5..70b3de4c7e 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1043,6 +1043,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
NULL, NULL,
devlist, qdict_haskey(qdict, "speed"), speed,
false, 0, // BackupPerf max-workers
+ false, false, // fleecing
&error);
hmp_handle_error(mon, error);
diff --git a/pve-backup.c b/pve-backup.c
index 9d480a8eec..7cc1dd3724 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -7,9 +7,11 @@
#include "sysemu/blockdev.h"
#include "block/block_int-global-state.h"
#include "block/blockjob.h"
+#include "block/copy-before-write.h"
#include "block/dirty-bitmap.h"
#include "block/graph-lock.h"
#include "qapi/qapi-commands-block.h"
+#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qerror.h"
#include "qemu/cutils.h"
@@ -81,8 +83,15 @@ static void pvebackup_init(void)
// initialize PVEBackupState at startup
opts_init(pvebackup_init);
+typedef struct PVEBackupFleecingInfo {
+ BlockDriverState *bs;
+ BlockDriverState *cbw;
+ BlockDriverState *snapshot_access;
+} PVEBackupFleecingInfo;
+
typedef struct PVEBackupDevInfo {
BlockDriverState *bs;
+ PVEBackupFleecingInfo fleecing;
size_t size;
uint64_t block_size;
uint8_t dev_id;
@@ -355,6 +364,25 @@ static void pvebackup_complete_cb(void *opaque, int ret)
PVEBackupDevInfo *di = opaque;
di->completed_ret = ret;
+ /*
+ * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work
+ * won't be done as a coroutine anyways:
+ * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would
+ * just spawn a BH calling bdrv_unref().
+ * - For cbw, draining would need to spawn a BH.
+ *
+ * Note that the AioContext lock is already acquired by our caller, i.e.
+ * job_finalize_single_locked()
+ */
+ if (di->fleecing.snapshot_access) {
+ bdrv_unref(di->fleecing.snapshot_access);
+ di->fleecing.snapshot_access = NULL;
+ }
+ if (di->fleecing.cbw) {
+ bdrv_cbw_drop(di->fleecing.cbw);
+ di->fleecing.cbw = NULL;
+ }
+
/*
* Needs to happen outside of coroutine, because it takes the graph write lock.
*/
@@ -522,9 +550,82 @@ static void create_backup_jobs_bh(void *opaque) {
}
bdrv_drained_begin(di->bs);
+ BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers };
+
+ BlockDriverState *source_bs = di->bs;
+ bool discard_source = false;
+ bdrv_graph_co_rdlock();
+ const char *job_id = bdrv_get_device_name(di->bs);
+ bdrv_graph_co_rdunlock();
+ if (di->fleecing.bs) {
+ QDict *cbw_opts = qdict_new();
+ qdict_put_str(cbw_opts, "driver", "copy-before-write");
+ qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
+ qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs));
+
+ if (di->bitmap) {
+ /*
+ * Only guest writes to parts relevant for the backup need to be intercepted with
+ * old data being copied to the fleecing image.
+ */
+ qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs));
+ qdict_put_str(cbw_opts, "bitmap.name", bdrv_dirty_bitmap_name(di->bitmap));
+ }
+ /*
+ * Fleecing storage is supposed to be fast and it's better to break backup than guest
+ * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout by default, so
+ * abort a bit before that.
+ */
+ qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
+ qdict_put_int(cbw_opts, "cbw-timeout", 45);
+
+ di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, &local_err);
+
+ if (!di->fleecing.cbw) {
+ error_setg(errp, "appending cbw node for fleecing failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ break;
+ }
+
+ QDict *snapshot_access_opts = qdict_new();
+ qdict_put_str(snapshot_access_opts, "driver", "snapshot-access");
+ qdict_put_str(snapshot_access_opts, "file", bdrv_get_node_name(di->fleecing.cbw));
+
+ /*
+ * Holding the AioContext lock here would cause a deadlock, because bdrv_open_driver()
+ * will aquire it a second time. But it's allowed to be held exactly once when polling
+ * and that happens when the bdrv_refresh_total_sectors() call is made there.
+ */
+ di->fleecing.snapshot_access =
+ bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | BDRV_O_UNMAP, &local_err);
+ if (!di->fleecing.snapshot_access) {
+ error_setg(errp, "setting up snapshot access for fleecing failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ break;
+ }
+ source_bs = di->fleecing.snapshot_access;
+ discard_source = true;
+
+ /*
+ * bdrv_get_info() just retuns 0 (= doesn't matter) for RBD when using krbd. But discard
+ * on the fleecing image won't work if the backup job's granularity is less than the RBD
+ * object size (default 4 MiB), so it does matter. Always use at least 4 MiB. With a PBS
+ * target, the backup job granularity would already be at least this much.
+ */
+ perf.min_cluster_size = 4 * 1024 * 1024;
+ /*
+ * For discard to work, cluster size for the backup job must be at least the same as for
+ * the fleecing image.
+ */
+ BlockDriverInfo bdi;
+ if (bdrv_get_info(di->fleecing.bs, &bdi) >= 0) {
+ perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
+ }
+ }
+
BlockJob *job = backup_job_create(
- NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
- bitmap_mode, false, NULL, &backup_state.perf, BLOCKDEV_ON_ERROR_REPORT,
+ job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
+ bitmap_mode, false, discard_source, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT,
BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
&local_err);
@@ -580,6 +681,14 @@ static void create_backup_jobs_bh(void *opaque) {
aio_co_enter(data->ctx, data->co);
}
+/*
+ * EFI disk and TPM state are small and it's just not worth setting up fleecing for them.
+ */
+static bool device_uses_fleecing(const char *device_id)
+{
+ return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14);
+}
+
/*
* Returns a list of device infos, which needs to be freed by the caller. In
* case of an error, errp will be set, but the returned value might still be a
@@ -587,6 +696,7 @@ static void create_backup_jobs_bh(void *opaque) {
*/
static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
const char *devlist,
+ bool fleecing,
Error **errp)
{
gchar **devs = NULL;
@@ -610,6 +720,31 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
}
PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
di->bs = bs;
+
+ if (fleecing && device_uses_fleecing(*d)) {
+ g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
+ BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
+ if (!fleecing_blk) {
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ "Device '%s' not found", fleecing_devid);
+ goto err;
+ }
+ BlockDriverState *fleecing_bs = blk_bs(fleecing_blk);
+ if (!bdrv_co_is_inserted(fleecing_bs)) {
+ error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, fleecing_devid);
+ goto err;
+ }
+ /*
+ * Fleecing image needs to be the same size to act as a cbw target.
+ */
+ if (bs->total_sectors != fleecing_bs->total_sectors) {
+ error_setg(errp, "Size mismatch for '%s' - sector count %ld != %ld",
+ fleecing_devid, fleecing_bs->total_sectors, bs->total_sectors);
+ goto err;
+ }
+ di->fleecing.bs = fleecing_bs;
+ }
+
di_list = g_list_append(di_list, di);
d++;
}
@@ -659,6 +794,7 @@ UuidInfo coroutine_fn *qmp_backup(
const char *devlist,
bool has_speed, int64_t speed,
bool has_max_workers, int64_t max_workers,
+ bool has_fleecing, bool fleecing,
Error **errp)
{
assert(qemu_in_coroutine());
@@ -687,7 +823,7 @@ UuidInfo coroutine_fn *qmp_backup(
format = has_format ? format : BACKUP_FORMAT_VMA;
bdrv_graph_co_rdlock();
- di_list = get_device_info(devlist, &local_err);
+ di_list = get_device_info(devlist, has_fleecing && fleecing, &local_err);
bdrv_graph_co_rdunlock();
if (local_err) {
error_propagate(errp, local_err);
@@ -1095,5 +1231,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->query_bitmap_info = true;
ret->pbs_masterkey = true;
ret->backup_max_workers = true;
+ ret->backup_fleecing = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6e7ee87633..dc5f75cd39 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -948,6 +948,10 @@
#
# @max-workers: see @BackupPerf for details. Default 16.
#
+# @fleecing: perform a backup with fleecing. For each device in @devlist, a
+# corresponing '-fleecing' device with the same size already needs to
+# be present.
+#
# Returns: the uuid of the backup job
#
##
@@ -968,7 +972,8 @@
'*firewall-file': 'str',
'*devlist': 'str',
'*speed': 'int',
- '*max-workers': 'int' },
+ '*max-workers': 'int',
+ '*fleecing': 'bool' },
'returns': 'UuidInfo', 'coroutine': true }
##
@@ -1014,6 +1019,8 @@
#
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
#
+# @backup-fleecing: Whether backup fleecing is supported or not.
+#
# @backup-max-workers: Whether the 'max-workers' @BackupPerf setting is
# supported or not.
#
@@ -1025,6 +1032,7 @@
'pbs-dirty-bitmap-migration': 'bool',
'pbs-masterkey': 'bool',
'pbs-library-version': 'str',
+ 'backup-fleecing': 'bool',
'backup-max-workers': 'bool' } }
##

View File

@ -0,0 +1,117 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 29 Apr 2024 14:43:58 +0200
Subject: [PATCH] PVE backup: improve error when copy-before-write fails for
fleecing
With fleecing, failure for copy-before-write does not fail the guest
write, but only sets the snapshot error that is associated to the
copy-before-write filter, making further requests to the snapshot
access fail with EACCES, which then also fails the job. But that error
code is not the root cause of why the backup failed, so bubble up the
original snapshot error instead.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
---
block/copy-before-write.c | 18 ++++++++++++------
block/copy-before-write.h | 1 +
pve-backup.c | 9 +++++++++
3 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index bba58326d7..50cc4c7aae 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -27,6 +27,7 @@
#include "qapi/qmp/qjson.h"
#include "sysemu/block-backend.h"
+#include "qemu/atomic.h"
#include "qemu/cutils.h"
#include "qapi/error.h"
#include "block/block_int.h"
@@ -74,7 +75,8 @@ typedef struct BDRVCopyBeforeWriteState {
* @snapshot_error is normally zero. But on first copy-before-write failure
* when @on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT, @snapshot_error takes
* value of this error (<0). After that all in-flight and further
- * snapshot-API requests will fail with that error.
+ * snapshot-API requests will fail with that error. To be accessed with
+ * atomics.
*/
int snapshot_error;
} BDRVCopyBeforeWriteState;
@@ -114,7 +116,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
return 0;
}
- if (s->snapshot_error) {
+ if (qatomic_read(&s->snapshot_error)) {
return 0;
}
@@ -138,9 +140,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
WITH_QEMU_LOCK_GUARD(&s->lock) {
if (ret < 0) {
assert(s->on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT);
- if (!s->snapshot_error) {
- s->snapshot_error = ret;
- }
+ qatomic_cmpxchg(&s->snapshot_error, 0, ret);
} else {
bdrv_set_dirty_bitmap(s->done_bitmap, off, end - off);
}
@@ -214,7 +214,7 @@ cbw_snapshot_read_lock(BlockDriverState *bs, int64_t offset, int64_t bytes,
QEMU_LOCK_GUARD(&s->lock);
- if (s->snapshot_error) {
+ if (qatomic_read(&s->snapshot_error)) {
g_free(req);
return NULL;
}
@@ -585,6 +585,12 @@ void bdrv_cbw_drop(BlockDriverState *bs)
bdrv_unref(bs);
}
+int bdrv_cbw_snapshot_error(BlockDriverState *bs)
+{
+ BDRVCopyBeforeWriteState *s = bs->opaque;
+ return qatomic_read(&s->snapshot_error);
+}
+
static void cbw_init(void)
{
bdrv_register(&bdrv_cbw_filter);
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index dc6cafe7fa..a27d2d7d9f 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -44,5 +44,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockCopyState **bcs,
Error **errp);
void bdrv_cbw_drop(BlockDriverState *bs);
+int bdrv_cbw_snapshot_error(BlockDriverState *bs);
#endif /* COPY_BEFORE_WRITE_H */
diff --git a/pve-backup.c b/pve-backup.c
index 7cc1dd3724..07709aa350 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -379,6 +379,15 @@ static void pvebackup_complete_cb(void *opaque, int ret)
di->fleecing.snapshot_access = NULL;
}
if (di->fleecing.cbw) {
+ /*
+ * With fleecing, failure for cbw does not fail the guest write, but only sets the snapshot
+ * error, making further requests to the snapshot fail with EACCES, which then also fail the
+ * job. But that code is not the root cause and just confusing, so update it.
+ */
+ int snapshot_error = bdrv_cbw_snapshot_error(di->fleecing.cbw);
+ if (di->completed_ret == -EACCES && snapshot_error) {
+ di->completed_ret = snapshot_error;
+ }
bdrv_cbw_drop(di->fleecing.cbw);
di->fleecing.cbw = NULL;
}

View File

@ -1,407 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Thu, 21 Apr 2022 13:26:48 +0200
Subject: [PATCH] vma: allow partial restore
Introduce a new map line for skipping a certain drive, of the form
skip=drive-scsi0
Since in PVE, most archives are compressed and piped to vma for
restore, it's not easily possible to skip reads.
For the reader, a new skip flag for VmaRestoreState is added and the
target is allowed to be NULL if skip is specified when registering. If
the skip flag is set, no writes will be made as well as no check for
duplicate clusters. Therefore, the flag is not set for verify.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
vma-reader.c | 64 ++++++++++++---------
vma.c | 157 +++++++++++++++++++++++++++++----------------------
vma.h | 2 +-
3 files changed, 126 insertions(+), 97 deletions(-)
diff --git a/vma-reader.c b/vma-reader.c
index e65f1e8415..81a891c6b1 100644
--- a/vma-reader.c
+++ b/vma-reader.c
@@ -28,6 +28,7 @@ typedef struct VmaRestoreState {
bool write_zeroes;
unsigned long *bitmap;
int bitmap_size;
+ bool skip;
} VmaRestoreState;
struct VmaReader {
@@ -425,13 +426,14 @@ VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id)
}
static void allocate_rstate(VmaReader *vmar, guint8 dev_id,
- BlockBackend *target, bool write_zeroes)
+ BlockBackend *target, bool write_zeroes, bool skip)
{
assert(vmar);
assert(dev_id);
vmar->rstate[dev_id].target = target;
vmar->rstate[dev_id].write_zeroes = write_zeroes;
+ vmar->rstate[dev_id].skip = skip;
int64_t size = vmar->devinfo[dev_id].size;
@@ -446,28 +448,30 @@ static void allocate_rstate(VmaReader *vmar, guint8 dev_id,
}
int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id, BlockBackend *target,
- bool write_zeroes, Error **errp)
+ bool write_zeroes, bool skip, Error **errp)
{
assert(vmar);
- assert(target != NULL);
+ assert(target != NULL || skip);
assert(dev_id);
- assert(vmar->rstate[dev_id].target == NULL);
-
- int64_t size = blk_getlength(target);
- int64_t size_diff = size - vmar->devinfo[dev_id].size;
-
- /* storage types can have different size restrictions, so it
- * is not always possible to create an image with exact size.
- * So we tolerate a size difference up to 4MB.
- */
- if ((size_diff < 0) || (size_diff > 4*1024*1024)) {
- error_setg(errp, "vma_reader_register_bs for stream %s failed - "
- "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
- size, vmar->devinfo[dev_id].size);
- return -1;
+ assert(vmar->rstate[dev_id].target == NULL && !vmar->rstate[dev_id].skip);
+
+ if (target != NULL) {
+ int64_t size = blk_getlength(target);
+ int64_t size_diff = size - vmar->devinfo[dev_id].size;
+
+ /* storage types can have different size restrictions, so it
+ * is not always possible to create an image with exact size.
+ * So we tolerate a size difference up to 4MB.
+ */
+ if ((size_diff < 0) || (size_diff > 4*1024*1024)) {
+ error_setg(errp, "vma_reader_register_bs for stream %s failed - "
+ "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
+ size, vmar->devinfo[dev_id].size);
+ return -1;
+ }
}
- allocate_rstate(vmar, dev_id, target, write_zeroes);
+ allocate_rstate(vmar, dev_id, target, write_zeroes, skip);
return 0;
}
@@ -560,19 +564,23 @@ static int restore_extent(VmaReader *vmar, unsigned char *buf,
VmaRestoreState *rstate = &vmar->rstate[dev_id];
BlockBackend *target = NULL;
+ bool skip = rstate->skip;
+
if (dev_id != vmar->vmstate_stream) {
target = rstate->target;
- if (!verify && !target) {
+ if (!verify && !target && !skip) {
error_setg(errp, "got wrong dev id %d", dev_id);
return -1;
}
- if (vma_reader_get_bitmap(rstate, cluster_num)) {
- error_setg(errp, "found duplicated cluster %zd for stream %s",
- cluster_num, vmar->devinfo[dev_id].devname);
- return -1;
+ if (!skip) {
+ if (vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "found duplicated cluster %zd for stream %s",
+ cluster_num, vmar->devinfo[dev_id].devname);
+ return -1;
+ }
+ vma_reader_set_bitmap(rstate, cluster_num, 1);
}
- vma_reader_set_bitmap(rstate, cluster_num, 1);
max_sector = vmar->devinfo[dev_id].size/BDRV_SECTOR_SIZE;
} else {
@@ -618,7 +626,7 @@ static int restore_extent(VmaReader *vmar, unsigned char *buf,
return -1;
}
- if (!verify) {
+ if (!verify && !skip) {
int nb_sectors = end_sector - sector_num;
if (restore_write_data(vmar, dev_id, target, vmstate_fd,
buf + start, sector_num, nb_sectors,
@@ -654,7 +662,7 @@ static int restore_extent(VmaReader *vmar, unsigned char *buf,
return -1;
}
- if (!verify) {
+ if (!verify && !skip) {
int nb_sectors = end_sector - sector_num;
if (restore_write_data(vmar, dev_id, target, vmstate_fd,
buf + start, sector_num,
@@ -679,7 +687,7 @@ static int restore_extent(VmaReader *vmar, unsigned char *buf,
vmar->partial_zero_cluster_data += zero_size;
}
- if (rstate->write_zeroes && !verify) {
+ if (rstate->write_zeroes && !verify && !skip) {
if (restore_write_data(vmar, dev_id, target, vmstate_fd,
zero_vma_block, sector_num,
nb_sectors, errp) < 0) {
@@ -850,7 +858,7 @@ int vma_reader_verify(VmaReader *vmar, bool verbose, Error **errp)
for (dev_id = 1; dev_id < 255; dev_id++) {
if (vma_reader_get_device_info(vmar, dev_id)) {
- allocate_rstate(vmar, dev_id, NULL, false);
+ allocate_rstate(vmar, dev_id, NULL, false, false);
}
}
diff --git a/vma.c b/vma.c
index e8dffb43e0..e6e9ffc7fe 100644
--- a/vma.c
+++ b/vma.c
@@ -138,6 +138,7 @@ typedef struct RestoreMap {
char *throttling_group;
char *cache;
bool write_zero;
+ bool skip;
} RestoreMap;
static bool try_parse_option(char **line, const char *optname, char **out, const char *inbuf) {
@@ -245,47 +246,61 @@ static int extract_content(int argc, char **argv)
char *bps = NULL;
char *group = NULL;
char *cache = NULL;
+ char *devname = NULL;
+ bool skip = false;
+ uint64_t bps_value = 0;
+ const char *path = NULL;
+ bool write_zero = true;
+
if (!line || line[0] == '\0' || !strcmp(line, "done\n")) {
break;
}
int len = strlen(line);
if (line[len - 1] == '\n') {
line[len - 1] = '\0';
- if (len == 1) {
+ len = len - 1;
+ if (len == 0) {
break;
}
}
- while (1) {
- if (!try_parse_option(&line, "format", &format, inbuf) &&
- !try_parse_option(&line, "throttling.bps", &bps, inbuf) &&
- !try_parse_option(&line, "throttling.group", &group, inbuf) &&
- !try_parse_option(&line, "cache", &cache, inbuf))
- {
- break;
+ if (strncmp(line, "skip", 4) == 0) {
+ if (len < 6 || line[4] != '=') {
+ g_error("read map failed - option 'skip' has no value ('%s')",
+ inbuf);
+ } else {
+ devname = line + 5;
+ skip = true;
+ }
+ } else {
+ while (1) {
+ if (!try_parse_option(&line, "format", &format, inbuf) &&
+ !try_parse_option(&line, "throttling.bps", &bps, inbuf) &&
+ !try_parse_option(&line, "throttling.group", &group, inbuf) &&
+ !try_parse_option(&line, "cache", &cache, inbuf))
+ {
+ break;
+ }
}
- }
- uint64_t bps_value = 0;
- if (bps) {
- bps_value = verify_u64(bps);
- g_free(bps);
- }
+ if (bps) {
+ bps_value = verify_u64(bps);
+ g_free(bps);
+ }
- const char *path;
- bool write_zero;
- if (line[0] == '0' && line[1] == ':') {
- path = line + 2;
- write_zero = false;
- } else if (line[0] == '1' && line[1] == ':') {
- path = line + 2;
- write_zero = true;
- } else {
- g_error("read map failed - parse error ('%s')", inbuf);
+ if (line[0] == '0' && line[1] == ':') {
+ path = line + 2;
+ write_zero = false;
+ } else if (line[0] == '1' && line[1] == ':') {
+ path = line + 2;
+ write_zero = true;
+ } else {
+ g_error("read map failed - parse error ('%s')", inbuf);
+ }
+
+ path = extract_devname(path, &devname, -1);
}
- char *devname = NULL;
- path = extract_devname(path, &devname, -1);
if (!devname) {
g_error("read map failed - no dev name specified ('%s')",
inbuf);
@@ -299,6 +314,7 @@ static int extract_content(int argc, char **argv)
map->throttling_group = group;
map->cache = cache;
map->write_zero = write_zero;
+ map->skip = skip;
g_hash_table_insert(devmap, map->devname, map);
@@ -328,6 +344,7 @@ static int extract_content(int argc, char **argv)
const char *cache = NULL;
int flags = BDRV_O_RDWR;
bool write_zero = true;
+ bool skip = false;
BlockBackend *blk = NULL;
@@ -343,6 +360,7 @@ static int extract_content(int argc, char **argv)
throttling_group = map->throttling_group;
cache = map->cache;
write_zero = map->write_zero;
+ skip = map->skip;
} else {
devfn = g_strdup_printf("%s/tmp-disk-%s.raw",
dirname, di->devname);
@@ -361,57 +379,60 @@ static int extract_content(int argc, char **argv)
write_zero = false;
}
- size_t devlen = strlen(devfn);
- QDict *options = NULL;
- bool writethrough;
- if (format) {
- /* explicit format from commandline */
- options = qdict_new();
- qdict_put_str(options, "driver", format);
- } else if ((devlen > 4 && strcmp(devfn+devlen-4, ".raw") == 0) ||
- strncmp(devfn, "/dev/", 5) == 0)
- {
- /* This part is now deprecated for PVE as well (just as qemu
- * deprecated not specifying an explicit raw format, too.
- */
- /* explicit raw format */
- options = qdict_new();
- qdict_put_str(options, "driver", "raw");
- }
- if (cache && bdrv_parse_cache_mode(cache, &flags, &writethrough)) {
- g_error("invalid cache option: %s\n", cache);
- }
+ if (!skip) {
+ size_t devlen = strlen(devfn);
+ QDict *options = NULL;
+ bool writethrough;
+ if (format) {
+ /* explicit format from commandline */
+ options = qdict_new();
+ qdict_put_str(options, "driver", format);
+ } else if ((devlen > 4 && strcmp(devfn+devlen-4, ".raw") == 0) ||
+ strncmp(devfn, "/dev/", 5) == 0)
+ {
+ /* This part is now deprecated for PVE as well (just as qemu
+ * deprecated not specifying an explicit raw format, too.
+ */
+ /* explicit raw format */
+ options = qdict_new();
+ qdict_put_str(options, "driver", "raw");
+ }
- if (errp || !(blk = blk_new_open(devfn, NULL, options, flags, &errp))) {
- g_error("can't open file %s - %s", devfn,
- error_get_pretty(errp));
- }
+ if (cache && bdrv_parse_cache_mode(cache, &flags, &writethrough)) {
+ g_error("invalid cache option: %s\n", cache);
+ }
- if (cache) {
- blk_set_enable_write_cache(blk, !writethrough);
- }
+ if (errp || !(blk = blk_new_open(devfn, NULL, options, flags, &errp))) {
+ g_error("can't open file %s - %s", devfn,
+ error_get_pretty(errp));
+ }
- if (throttling_group) {
- blk_io_limits_enable(blk, throttling_group);
- }
+ if (cache) {
+ blk_set_enable_write_cache(blk, !writethrough);
+ }
- if (throttling_bps) {
- if (!throttling_group) {
- blk_io_limits_enable(blk, devfn);
+ if (throttling_group) {
+ blk_io_limits_enable(blk, throttling_group);
}
- ThrottleConfig cfg;
- throttle_config_init(&cfg);
- cfg.buckets[THROTTLE_BPS_WRITE].avg = throttling_bps;
- Error *err = NULL;
- if (!throttle_is_valid(&cfg, &err)) {
- error_report_err(err);
- g_error("failed to apply throttling");
+ if (throttling_bps) {
+ if (!throttling_group) {
+ blk_io_limits_enable(blk, devfn);
+ }
+
+ ThrottleConfig cfg;
+ throttle_config_init(&cfg);
+ cfg.buckets[THROTTLE_BPS_WRITE].avg = throttling_bps;
+ Error *err = NULL;
+ if (!throttle_is_valid(&cfg, &err)) {
+ error_report_err(err);
+ g_error("failed to apply throttling");
+ }
+ blk_set_io_limits(blk, &cfg);
}
- blk_set_io_limits(blk, &cfg);
}
- if (vma_reader_register_bs(vmar, i, blk, write_zero, &errp) < 0) {
+ if (vma_reader_register_bs(vmar, i, blk, write_zero, skip, &errp) < 0) {
g_error("%s", error_get_pretty(errp));
}
diff --git a/vma.h b/vma.h
index c895c97f6d..1b62859165 100644
--- a/vma.h
+++ b/vma.h
@@ -142,7 +142,7 @@ GList *vma_reader_get_config_data(VmaReader *vmar);
VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id);
int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id,
BlockBackend *target, bool write_zeroes,
- Error **errp);
+ bool skip, Error **errp);
int vma_reader_restore(VmaReader *vmar, int vmstate_fd, bool verbose,
Error **errp);
int vma_reader_verify(VmaReader *vmar, bool verbose, Error **errp);

View File

@ -1,233 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Tue, 26 Apr 2022 16:06:28 +0200
Subject: [PATCH] pbs: namespace support
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
block/pbs.c | 25 +++++++++++++++++++++----
pbs-restore.c | 19 ++++++++++++++++---
pve-backup.c | 6 +++++-
qapi/block-core.json | 5 ++++-
5 files changed, 47 insertions(+), 9 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index c7468e5d3b..57b2457f1e 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1041,6 +1041,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS key_password
false, NULL, // PBS master_keyfile
false, NULL, // PBS fingerprint
+ false, NULL, // PBS backup-ns
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
false, false, // PBS use-dirty-bitmap
diff --git a/block/pbs.c b/block/pbs.c
index ce9a870885..9192f3e41b 100644
--- a/block/pbs.c
+++ b/block/pbs.c
@@ -14,6 +14,7 @@
#include <proxmox-backup-qemu.h>
#define PBS_OPT_REPOSITORY "repository"
+#define PBS_OPT_NAMESPACE "namespace"
#define PBS_OPT_SNAPSHOT "snapshot"
#define PBS_OPT_ARCHIVE "archive"
#define PBS_OPT_KEYFILE "keyfile"
@@ -27,6 +28,7 @@ typedef struct {
int64_t length;
char *repository;
+ char *namespace;
char *snapshot;
char *archive;
} BDRVPBSState;
@@ -40,6 +42,11 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "The server address and repository to connect to.",
},
+ {
+ .name = PBS_OPT_NAMESPACE,
+ .type = QEMU_OPT_STRING,
+ .help = "Optional: The snapshot's namespace.",
+ },
{
.name = PBS_OPT_SNAPSHOT,
.type = QEMU_OPT_STRING,
@@ -76,7 +83,7 @@ static QemuOptsList runtime_opts = {
// filename format:
-// pbs:repository=<repo>,snapshot=<snap>,password=<pw>,key_password=<kpw>,fingerprint=<fp>,archive=<archive>
+// pbs:repository=<repo>,namespace=<ns>,snapshot=<snap>,password=<pw>,key_password=<kpw>,fingerprint=<fp>,archive=<archive>
static void pbs_parse_filename(const char *filename, QDict *options,
Error **errp)
{
@@ -112,6 +119,7 @@ static int pbs_open(BlockDriverState *bs, QDict *options, int flags,
s->archive = g_strdup(qemu_opt_get(opts, PBS_OPT_ARCHIVE));
const char *keyfile = qemu_opt_get(opts, PBS_OPT_KEYFILE);
const char *password = qemu_opt_get(opts, PBS_OPT_PASSWORD);
+ const char *namespace = qemu_opt_get(opts, PBS_OPT_NAMESPACE);
const char *fingerprint = qemu_opt_get(opts, PBS_OPT_FINGERPRINT);
const char *key_password = qemu_opt_get(opts, PBS_OPT_ENCRYPTION_PASSWORD);
@@ -124,9 +132,12 @@ static int pbs_open(BlockDriverState *bs, QDict *options, int flags,
if (!key_password) {
key_password = getenv("PBS_ENCRYPTION_PASSWORD");
}
+ if (namespace) {
+ s->namespace = g_strdup(namespace);
+ }
/* connect to PBS server in read mode */
- s->conn = proxmox_restore_new(s->repository, s->snapshot, password,
+ s->conn = proxmox_restore_new_ns(s->repository, s->snapshot, s->namespace, password,
keyfile, key_password, fingerprint, &pbs_error);
/* invalidates qemu_opt_get char pointers from above */
@@ -171,6 +182,7 @@ static int pbs_file_open(BlockDriverState *bs, QDict *options, int flags,
static void pbs_close(BlockDriverState *bs) {
BDRVPBSState *s = bs->opaque;
g_free(s->repository);
+ g_free(s->namespace);
g_free(s->snapshot);
g_free(s->archive);
proxmox_restore_disconnect(s->conn);
@@ -252,8 +264,13 @@ static coroutine_fn int pbs_co_pwritev(BlockDriverState *bs,
static void pbs_refresh_filename(BlockDriverState *bs)
{
BDRVPBSState *s = bs->opaque;
- snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s(%s)",
- s->repository, s->snapshot, s->archive);
+ if (s->namespace) {
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s:%s(%s)",
+ s->repository, s->namespace, s->snapshot, s->archive);
+ } else {
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s(%s)",
+ s->repository, s->snapshot, s->archive);
+ }
}
static const char *const pbs_strong_runtime_opts[] = {
diff --git a/pbs-restore.c b/pbs-restore.c
index 2f834cf42e..f03d9bab8d 100644
--- a/pbs-restore.c
+++ b/pbs-restore.c
@@ -29,7 +29,7 @@
static void help(void)
{
const char *help_msg =
- "usage: pbs-restore [--repository <repo>] snapshot archive-name target [command options]\n"
+ "usage: pbs-restore [--repository <repo>] [--ns namespace] snapshot archive-name target [command options]\n"
;
printf("%s", help_msg);
@@ -77,6 +77,7 @@ int main(int argc, char **argv)
Error *main_loop_err = NULL;
const char *format = "raw";
const char *repository = NULL;
+ const char *backup_ns = NULL;
const char *keyfile = NULL;
int verbose = false;
bool skip_zero = false;
@@ -90,6 +91,7 @@ int main(int argc, char **argv)
{"verbose", no_argument, 0, 'v'},
{"format", required_argument, 0, 'f'},
{"repository", required_argument, 0, 'r'},
+ {"ns", required_argument, 0, 'n'},
{"keyfile", required_argument, 0, 'k'},
{0, 0, 0, 0}
};
@@ -110,6 +112,9 @@ int main(int argc, char **argv)
case 'r':
repository = g_strdup(argv[optind - 1]);
break;
+ case 'n':
+ backup_ns = g_strdup(argv[optind - 1]);
+ break;
case 'k':
keyfile = g_strdup(argv[optind - 1]);
break;
@@ -160,8 +165,16 @@ int main(int argc, char **argv)
fprintf(stderr, "connecting to repository '%s'\n", repository);
}
char *pbs_error = NULL;
- ProxmoxRestoreHandle *conn = proxmox_restore_new(
- repository, snapshot, password, keyfile, key_password, fingerprint, &pbs_error);
+ ProxmoxRestoreHandle *conn = proxmox_restore_new_ns(
+ repository,
+ snapshot,
+ backup_ns,
+ password,
+ keyfile,
+ key_password,
+ fingerprint,
+ &pbs_error
+ );
if (conn == NULL) {
fprintf(stderr, "restore failed: %s\n", pbs_error);
return -1;
diff --git a/pve-backup.c b/pve-backup.c
index 4b5134ed27..262e7d3894 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -10,6 +10,8 @@
#include "qapi/qmp/qerror.h"
#include "qemu/cutils.h"
+#include <proxmox-backup-qemu.h>
+
/* PVE backup state and related function */
/*
@@ -531,6 +533,7 @@ UuidInfo coroutine_fn *qmp_backup(
bool has_key_password, const char *key_password,
bool has_master_keyfile, const char *master_keyfile,
bool has_fingerprint, const char *fingerprint,
+ bool has_backup_ns, const char *backup_ns,
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
bool has_use_dirty_bitmap, bool use_dirty_bitmap,
@@ -670,8 +673,9 @@ UuidInfo coroutine_fn *qmp_backup(
firewall_name = "fw.conf";
char *pbs_err = NULL;
- pbs = proxmox_backup_new(
+ pbs = proxmox_backup_new_ns(
backup_file,
+ has_backup_ns ? backup_ns : NULL,
backup_id,
backup_time,
dump_cb_block_size,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d8c7331090..889726fc26 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -817,6 +817,8 @@
#
# @fingerprint: server cert fingerprint (optional for format 'pbs')
#
+# @backup-ns: backup namespace (required for format 'pbs')
+#
# @backup-id: backup ID (required for format 'pbs')
#
# @backup-time: backup timestamp (Unix epoch, required for format 'pbs')
@@ -836,6 +838,7 @@
'*key-password': 'str',
'*master-keyfile': 'str',
'*fingerprint': 'str',
+ '*backup-ns': 'str',
'*backup-id': 'str',
'*backup-time': 'int',
'*use-dirty-bitmap': 'bool',
@@ -3290,7 +3293,7 @@
{ 'struct': 'BlockdevOptionsPbs',
'data': { 'repository': 'str', 'snapshot': 'str', 'archive': 'str',
'*keyfile': 'str', '*password': 'str', '*fingerprint': 'str',
- '*key_password': 'str' } }
+ '*key_password': 'str', '*namespace': 'str' } }
##
# @BlockdevOptionsNVMe:

View File

@ -1,60 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 25 May 2022 13:59:37 +0200
Subject: [PATCH] PVE-Backup: create jobs: correctly cancel in error scenario
The first call to job_cancel_sync() will cancel and free all jobs in
the transaction, so ensure that it's called only once and get rid of
the job_unref() that would operate on freed memory.
It's also necessary to NULL backup_state.pbs in the error scenario,
because a subsequent backup_cancel QMP call (as happens in PVE when
the backup QMP command fails) would try to call proxmox_backup_abort()
and run into a segfault.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[FE: adapt for new job lock mechanism replacing AioContext locks]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 262e7d3894..fde3554133 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -503,6 +503,11 @@ static void create_backup_jobs_bh(void *opaque) {
}
if (*errp) {
+ /*
+ * It's enough to cancel one job in the transaction, the rest will
+ * follow automatically.
+ */
+ bool canceled = false;
l = backup_state.di_list;
while (l) {
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
@@ -513,11 +518,11 @@ static void create_backup_jobs_bh(void *opaque) {
di->target = NULL;
}
- if (di->job) {
+ if (!canceled && di->job) {
WITH_JOB_LOCK_GUARD() {
job_cancel_sync_locked(&di->job->job, true);
- job_unref_locked(&di->job->job);
}
+ canceled = true;
}
}
}
@@ -943,6 +948,7 @@ err:
if (pbs) {
proxmox_backup_disconnect(pbs);
+ backup_state.pbs = NULL;
}
if (backup_dir) {

View File

@ -1,73 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 25 May 2022 13:59:38 +0200
Subject: [PATCH] PVE-Backup: ensure jobs in di_list are referenced
Ensures that qmp_backup_cancel doesn't pick a job that's already been
freed. With unlucky timings it seems possible that:
1. job_exit -> job_completed -> job_finalize_single starts
2. pvebackup_co_complete_stream gets spawned in completion callback
3. job finalize_single finishes -> job's refcount hits zero -> job is
freed
4. qmp_backup_cancel comes in and locks backup_state.backup_mutex
before pvebackup_co_complete_stream can remove the job from the
di_list
5. qmp_backup_cancel will pick a job that's already been freed
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[FE: adapt for new job lock mechanism replacing AioContext locks]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index fde3554133..0cf30e1ced 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -316,6 +316,13 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
}
}
+ if (di->job) {
+ WITH_JOB_LOCK_GUARD() {
+ job_unref_locked(&di->job->job);
+ di->job = NULL;
+ }
+ }
+
// remove self from job list
backup_state.di_list = g_list_remove(backup_state.di_list, di);
@@ -491,6 +498,11 @@ static void create_backup_jobs_bh(void *opaque) {
aio_context_release(aio_context);
di->job = job;
+ if (job) {
+ WITH_JOB_LOCK_GUARD() {
+ job_ref_locked(&job->job);
+ }
+ }
if (!job || local_err) {
error_setg(errp, "backup_job_create failed: %s",
@@ -518,11 +530,15 @@ static void create_backup_jobs_bh(void *opaque) {
di->target = NULL;
}
- if (!canceled && di->job) {
+ if (di->job) {
WITH_JOB_LOCK_GUARD() {
- job_cancel_sync_locked(&di->job->job, true);
+ if (!canceled) {
+ job_cancel_sync_locked(&di->job->job, true);
+ canceled = true;
+ }
+ job_unref_locked(&di->job->job);
+ di->job = NULL;
}
- canceled = true;
}
}
}

View File

@ -1,118 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 25 May 2022 13:59:39 +0200
Subject: [PATCH] PVE-Backup: avoid segfault issues upon backup-cancel
When canceling a backup in PVE via a signal it's easy to run into a
situation where the job is already failing when the backup_cancel QMP
command comes in. With a bit of unlucky timing on top, it can happen
that job_exit() runs between schedulung of job_cancel_bh() and
execution of job_cancel_bh(). But job_cancel_sync() does not expect
that the job is already finalized (in fact, the job might've been
freed already, but even if it isn't, job_cancel_sync() would try to
deref job->txn which would be NULL at that point).
It is not possible to simply use the job_cancel() (which is advertised
as being async but isn't in all cases) in qmp_backup_cancel() for the
same reason job_cancel_sync() cannot be used. Namely, because it can
invoke job_finish_sync() (which uses AIO_WAIT_WHILE and thus hangs if
called from a coroutine). This happens when there's multiple jobs in
the transaction and job->deferred_to_main_loop is true (is set before
scheduling job_exit()) or if the job was not started yet.
Fix the issue by selecting the job to cancel in job_cancel_bh() itself
using the first job that's not completed yet. This is not necessarily
the first job in the list, because pvebackup_co_complete_stream()
might not yet have removed a completed job when job_cancel_bh() runs.
An alternative would be to continue using only the first job and
checking against JOB_STATUS_CONCLUDED or JOB_STATUS_NULL to decide if
it's still necessary and possible to cancel, but the approach with
using the first non-completed job seemed more robust.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[FE: adapt for new job lock mechanism replacing AioContext locks]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 57 ++++++++++++++++++++++++++++++++++------------------
1 file changed, 38 insertions(+), 19 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 0cf30e1ced..4067018dbe 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -354,12 +354,41 @@ static void pvebackup_complete_cb(void *opaque, int ret)
/*
* job_cancel(_sync) does not like to be called from coroutines, so defer to
- * main loop processing via a bottom half.
+ * main loop processing via a bottom half. Assumes that caller holds
+ * backup_mutex.
*/
static void job_cancel_bh(void *opaque) {
CoCtxData *data = (CoCtxData*)opaque;
- Job *job = (Job*)data->data;
- job_cancel_sync(job, true);
+
+ /*
+ * Be careful to pick a valid job to cancel:
+ * 1. job_cancel_sync() does not expect the job to be finalized already.
+ * 2. job_exit() might run between scheduling and running job_cancel_bh()
+ * and pvebackup_co_complete_stream() might not have removed the job from
+ * the list yet (in fact, cannot, because it waits for the backup_mutex).
+ * Requiring !job_is_completed() ensures that no finalized job is picked.
+ */
+ GList *bdi = g_list_first(backup_state.di_list);
+ while (bdi) {
+ if (bdi->data) {
+ BlockJob *bj = ((PVEBackupDevInfo *)bdi->data)->job;
+ if (bj) {
+ Job *job = &bj->job;
+ WITH_JOB_LOCK_GUARD() {
+ if (!job_is_completed_locked(job)) {
+ job_cancel_sync_locked(job, true);
+ /*
+ * It's enough to cancel one job in the transaction, the
+ * rest will follow automatically.
+ */
+ break;
+ }
+ }
+ }
+ }
+ bdi = g_list_next(bdi);
+ }
+
aio_co_enter(data->ctx, data->co);
}
@@ -380,22 +409,12 @@ void coroutine_fn qmp_backup_cancel(Error **errp)
proxmox_backup_abort(backup_state.pbs, "backup canceled");
}
- /* it's enough to cancel one job in the transaction, the rest will follow
- * automatically */
- GList *bdi = g_list_first(backup_state.di_list);
- BlockJob *cancel_job = bdi && bdi->data ?
- ((PVEBackupDevInfo *)bdi->data)->job :
- NULL;
-
- if (cancel_job) {
- CoCtxData data = {
- .ctx = qemu_get_current_aio_context(),
- .co = qemu_coroutine_self(),
- .data = &cancel_job->job,
- };
- aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
- qemu_coroutine_yield();
- }
+ CoCtxData data = {
+ .ctx = qemu_get_current_aio_context(),
+ .co = qemu_coroutine_self(),
+ };
+ aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
+ qemu_coroutine_yield();
qemu_co_mutex_unlock(&backup_state.backup_mutex);
}

View File

@ -1,57 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 22 Jun 2022 10:45:11 +0200
Subject: [PATCH] vma: create: support 64KiB-unaligned input images
which fixes backing up templates with such disks in PVE, for example
efitype=4m EFI disks on a file-based storage (size = 540672).
If there is not enough left to read, blk_co_preadv will return -EIO,
so limit the size in the last iteration.
For writing, an unaligned end is already handled correctly.
The call to memset is not strictly necessary, because writing also
checks that it doesn't write data beyond the end of the image. But
there are two reasons to do it:
1. It's cleaner that way.
2. It allows detecting when the final piece is all zeroes, which might
not happen if the buffer still contains data from the previous
iteration.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
vma.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/vma.c b/vma.c
index e6e9ffc7fe..304f02bc84 100644
--- a/vma.c
+++ b/vma.c
@@ -548,7 +548,7 @@ static void coroutine_fn backup_run(void *opaque)
struct iovec iov;
QEMUIOVector qiov;
- int64_t start, end;
+ int64_t start, end, readlen;
int ret = 0;
unsigned char *buf = blk_blockalign(job->target, VMA_CLUSTER_SIZE);
@@ -562,8 +562,16 @@ static void coroutine_fn backup_run(void *opaque)
iov.iov_len = VMA_CLUSTER_SIZE;
qemu_iovec_init_external(&qiov, &iov, 1);
+ if (start + 1 == end) {
+ memset(buf, 0, VMA_CLUSTER_SIZE);
+ readlen = job->len - start * VMA_CLUSTER_SIZE;
+ assert(readlen > 0 && readlen <= VMA_CLUSTER_SIZE);
+ } else {
+ readlen = VMA_CLUSTER_SIZE;
+ }
+
ret = blk_co_preadv(job->target, start * VMA_CLUSTER_SIZE,
- VMA_CLUSTER_SIZE, &qiov, 0);
+ readlen, &qiov, 0);
if (ret < 0) {
vma_writer_set_error(job->vmaw, "read error", -1);
goto out;

View File

@ -1,25 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 22 Jun 2022 10:45:12 +0200
Subject: [PATCH] vma: create: avoid triggering assertion in error case
error_setg expects its argument to not be initialized yet.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
vma-writer.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/vma-writer.c b/vma-writer.c
index df4b20793d..ac7da237d0 100644
--- a/vma-writer.c
+++ b/vma-writer.c
@@ -311,6 +311,8 @@ VmaWriter *vma_writer_create(const char *filename, uuid_t uuid, Error **errp)
}
if (vmaw->fd < 0) {
+ error_free(*errp);
+ *errp = NULL;
error_setg(errp, "can't open file %s - %s\n", filename,
g_strerror(errno));
goto err;

View File

@ -1,36 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 22 Jun 2022 10:45:13 +0200
Subject: [PATCH] block: alloc-track: avoid premature break
While the bdrv_co_preadv() calls are expected to return 0 on success,
qemu_iovec_memset() will return the number of bytes set (will be
local_bytes, because the slice with that size was just initialized).
Don't break out of the loop after the branch with qemu_iovec_memset(),
because there might still be work to do. Additionally, ret is an int,
which on 64-bit platforms is too small to hold the size_t returned by
qemu_iovec_memset().
The branch seems to be difficult to reach in practice, because the
whole point of alloc-track is to be used with a backing device.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
block/alloc-track.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/block/alloc-track.c b/block/alloc-track.c
index 43d40d11af..95c9c67cd8 100644
--- a/block/alloc-track.c
+++ b/block/alloc-track.c
@@ -174,7 +174,8 @@ static int coroutine_fn track_co_preadv(BlockDriverState *bs,
ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
&local_qiov, flags);
} else {
- ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
+ qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
+ ret = 0;
}
if (ret != 0) {

Some files were not shown because too many files have changed in this diff Show More