pve-qemu/debian/patches/pve/0027-PVE-Backup-add-vma-bac...

2827 lines
80 KiB
Diff
Raw Permalink Normal View History

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:57 +0200
Subject: [PATCH] PVE-Backup: add vma backup format code
2017-04-05 11:49:19 +03:00
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
Notes about partial restoring: skipping a certain drive is done via a
map line of the form skip=drive-scsi0. Since in PVE, most archives are
compressed and piped to vma for restore, it's not easily possible to
skip reads.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
[FE: improvements during create
allow partial restore]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2017-04-05 11:49:19 +03:00
---
block/meson.build | 2 +
meson.build | 5 +
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
vma-reader.c | 870 ++++++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 818 +++++++++++++++++++++++++++++++++++++++++
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
vma.c | 901 ++++++++++++++++++++++++++++++++++++++++++++++
vma.h | 150 ++++++++
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
6 files changed, 2746 insertions(+)
2017-04-05 11:49:19 +03:00
create mode 100644 vma-reader.c
create mode 100644 vma-writer.c
create mode 100644 vma.c
create mode 100644 vma.h
diff --git a/block/meson.build b/block/meson.build
index b530e117b5..b245daa98e 100644
--- a/block/meson.build
+++ b/block/meson.build
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
@@ -42,6 +42,8 @@ block_ss.add(files(
'zeroinit.c',
), zstd, zlib, gnutls)
+block_ss.add(files('../vma-writer.c'), libuuid)
+
system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(files('block-ram-registrar.c'))
2017-04-05 11:49:19 +03:00
diff --git a/meson.build b/meson.build
index 91a0aa64c6..620cc594b2 100644
--- a/meson.build
+++ b/meson.build
@@ -1922,6 +1922,8 @@ endif
2017-04-05 11:49:19 +03:00
has_gettid = cc.has_function('gettid')
2017-04-05 11:49:19 +03:00
+libuuid = cc.find_library('uuid', required: true)
+
# libselinux
selinux = dependency('libselinux',
required: get_option('selinux'),
@@ -4023,6 +4025,9 @@ if have_tools
dependencies: [blockdev, qemuutil, gnutls, selinux],
install: true)
2017-04-05 11:49:19 +03:00
+ vma = executable('vma', files('vma.c', 'vma-reader.c') + genh,
+ dependencies: [authz, block, crypto, io, qom], install: true)
+
subdir('storage-daemon')
foreach exe: [ 'qemu-img', 'qemu-io', 'qemu-nbd', 'qemu-storage-daemon']
2017-04-05 11:49:19 +03:00
diff --git a/vma-reader.c b/vma-reader.c
new file mode 100644
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
index 0000000000..d0b6721812
2017-04-05 11:49:19 +03:00
--- /dev/null
+++ b/vma-reader.c
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
@@ -0,0 +1,870 @@
2017-04-05 11:49:19 +03:00
+/*
+ * VMA: Virtual Machine Archive
+ *
+ * Copyright (C) 2012 Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include <glib.h>
+#include <uuid/uuid.h>
+
+#include "qemu/timer.h"
+#include "qemu/ratelimit.h"
+#include "vma.h"
+#include "block/block.h"
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+#include "block/graph-lock.h"
2017-04-05 11:49:19 +03:00
+#include "sysemu/block-backend.h"
+
+static unsigned char zero_vma_block[VMA_BLOCK_SIZE];
+
+typedef struct VmaRestoreState {
2017-08-07 10:10:07 +03:00
+ BlockBackend *target;
2017-04-05 11:49:19 +03:00
+ bool write_zeroes;
+ unsigned long *bitmap;
+ int bitmap_size;
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ bool skip;
2017-04-05 11:49:19 +03:00
+} VmaRestoreState;
+
+struct VmaReader {
+ int fd;
+ GChecksum *md5csum;
+ GHashTable *blob_hash;
+ unsigned char *head_data;
+ VmaDeviceInfo devinfo[256];
+ VmaRestoreState rstate[256];
+ GList *cdata_list;
+ guint8 vmstate_stream;
+ uint32_t vmstate_clusters;
+ /* to show restore percentage if run with -v */
+ time_t start_time;
+ int64_t cluster_count;
+ int64_t clusters_read;
2017-08-07 10:10:07 +03:00
+ int64_t zero_cluster_data;
+ int64_t partial_zero_cluster_data;
2017-04-05 11:49:19 +03:00
+ int clusters_read_per;
+};
+
+static guint
+g_int32_hash(gconstpointer v)
+{
+ return *(const uint32_t *)v;
+}
+
+static gboolean
+g_int32_equal(gconstpointer v1, gconstpointer v2)
+{
+ return *((const uint32_t *)v1) == *((const uint32_t *)v2);
+}
+
+static int vma_reader_get_bitmap(VmaRestoreState *rstate, int64_t cluster_num)
+{
+ assert(rstate);
+ assert(rstate->bitmap);
+
+ unsigned long val, idx, bit;
+
+ idx = cluster_num / BITS_PER_LONG;
+
+ assert(rstate->bitmap_size > idx);
+
+ bit = cluster_num % BITS_PER_LONG;
+ val = rstate->bitmap[idx];
+
+ return !!(val & (1UL << bit));
+}
+
+static void vma_reader_set_bitmap(VmaRestoreState *rstate, int64_t cluster_num,
+ int dirty)
+{
+ assert(rstate);
+ assert(rstate->bitmap);
+
+ unsigned long val, idx, bit;
+
+ idx = cluster_num / BITS_PER_LONG;
+
+ assert(rstate->bitmap_size > idx);
+
+ bit = cluster_num % BITS_PER_LONG;
+ val = rstate->bitmap[idx];
+ if (dirty) {
+ if (!(val & (1UL << bit))) {
+ val |= 1UL << bit;
+ }
+ } else {
+ if (val & (1UL << bit)) {
+ val &= ~(1UL << bit);
+ }
+ }
+ rstate->bitmap[idx] = val;
+}
+
+typedef struct VmaBlob {
+ uint32_t start;
+ uint32_t len;
+ void *data;
+} VmaBlob;
+
+static const VmaBlob *get_header_blob(VmaReader *vmar, uint32_t pos)
+{
+ assert(vmar);
+ assert(vmar->blob_hash);
+
+ return g_hash_table_lookup(vmar->blob_hash, &pos);
+}
+
+static const char *get_header_str(VmaReader *vmar, uint32_t pos)
+{
+ const VmaBlob *blob = get_header_blob(vmar, pos);
+ if (!blob) {
+ return NULL;
+ }
+ const char *res = (char *)blob->data;
+ if (res[blob->len-1] != '\0') {
+ return NULL;
+ }
+ return res;
+}
+
+static ssize_t
+safe_read(int fd, unsigned char *buf, size_t count)
+{
+ ssize_t n;
+
+ do {
+ n = read(fd, buf, count);
+ } while (n < 0 && errno == EINTR);
+
+ return n;
+}
+
+static ssize_t
+full_read(int fd, unsigned char *buf, size_t len)
+{
+ ssize_t n;
+ size_t total;
+
+ total = 0;
+
+ while (len > 0) {
+ n = safe_read(fd, buf, len);
+
+ if (n == 0) {
+ return total;
+ }
+
+ if (n <= 0) {
+ break;
+ }
+
+ buf += n;
+ total += n;
+ len -= n;
+ }
+
+ if (len) {
+ return -1;
+ }
+
+ return total;
+}
+
+void vma_reader_destroy(VmaReader *vmar)
+{
+ assert(vmar);
+
+ if (vmar->fd >= 0) {
+ close(vmar->fd);
+ }
+
+ if (vmar->cdata_list) {
+ g_list_free(vmar->cdata_list);
+ }
+
+ int i;
+ for (i = 1; i < 256; i++) {
+ if (vmar->rstate[i].bitmap) {
+ g_free(vmar->rstate[i].bitmap);
+ }
+ if (vmar->rstate[i].target) {
+ blk_unref(vmar->rstate[i].target);
+ }
2017-04-05 11:49:19 +03:00
+ }
+
+ if (vmar->md5csum) {
+ g_checksum_free(vmar->md5csum);
+ }
+
+ if (vmar->blob_hash) {
+ g_hash_table_destroy(vmar->blob_hash);
+ }
+
+ if (vmar->head_data) {
+ g_free(vmar->head_data);
+ }
+
+ g_free(vmar);
+
+};
+
+static int vma_reader_read_head(VmaReader *vmar, Error **errp)
+{
+ assert(vmar);
+ assert(errp);
+ assert(*errp == NULL);
+
+ unsigned char md5sum[16];
+ int i;
+ int ret = 0;
+
+ vmar->head_data = g_malloc(sizeof(VmaHeader));
+
+ if (full_read(vmar->fd, vmar->head_data, sizeof(VmaHeader)) !=
+ sizeof(VmaHeader)) {
+ error_setg(errp, "can't read vma header - %s",
+ errno ? g_strerror(errno) : "got EOF");
+ return -1;
+ }
+
+ VmaHeader *h = (VmaHeader *)vmar->head_data;
+
+ if (h->magic != VMA_MAGIC) {
+ error_setg(errp, "not a vma file - wrong magic number");
+ return -1;
+ }
+
+ uint32_t header_size = GUINT32_FROM_BE(h->header_size);
+ int need = header_size - sizeof(VmaHeader);
+ if (need <= 0) {
+ error_setg(errp, "wrong vma header size %d", header_size);
+ return -1;
+ }
+
+ vmar->head_data = g_realloc(vmar->head_data, header_size);
+ h = (VmaHeader *)vmar->head_data;
+
+ if (full_read(vmar->fd, vmar->head_data + sizeof(VmaHeader), need) !=
+ need) {
+ error_setg(errp, "can't read vma header data - %s",
+ errno ? g_strerror(errno) : "got EOF");
+ return -1;
+ }
+
+ memcpy(md5sum, h->md5sum, 16);
+ memset(h->md5sum, 0, 16);
+
+ g_checksum_reset(vmar->md5csum);
+ g_checksum_update(vmar->md5csum, vmar->head_data, header_size);
+ gsize csize = 16;
+ g_checksum_get_digest(vmar->md5csum, (guint8 *)(h->md5sum), &csize);
+
+ if (memcmp(md5sum, h->md5sum, 16) != 0) {
+ error_setg(errp, "wrong vma header chechsum");
+ return -1;
+ }
+
+ /* we can modify header data after checksum verify */
+ h->header_size = header_size;
+
+ h->version = GUINT32_FROM_BE(h->version);
+ if (h->version != 1) {
+ error_setg(errp, "wrong vma version %d", h->version);
+ return -1;
+ }
+
+ h->ctime = GUINT64_FROM_BE(h->ctime);
+ h->blob_buffer_offset = GUINT32_FROM_BE(h->blob_buffer_offset);
+ h->blob_buffer_size = GUINT32_FROM_BE(h->blob_buffer_size);
+
+ uint32_t bstart = h->blob_buffer_offset + 1;
+ uint32_t bend = h->blob_buffer_offset + h->blob_buffer_size;
+
+ if (bstart <= sizeof(VmaHeader)) {
+ error_setg(errp, "wrong vma blob buffer offset %d",
+ h->blob_buffer_offset);
+ return -1;
+ }
+
+ if (bend > header_size) {
+ error_setg(errp, "wrong vma blob buffer size %d/%d",
+ h->blob_buffer_offset, h->blob_buffer_size);
+ return -1;
+ }
+
+ while ((bstart + 2) <= bend) {
+ uint32_t size = vmar->head_data[bstart] +
+ (vmar->head_data[bstart+1] << 8);
+ if ((bstart + size + 2) <= bend) {
+ VmaBlob *blob = g_new0(VmaBlob, 1);
+ blob->start = bstart - h->blob_buffer_offset;
+ blob->len = size;
+ blob->data = vmar->head_data + bstart + 2;
+ g_hash_table_insert(vmar->blob_hash, &blob->start, blob);
+ }
+ bstart += size + 2;
+ }
+
+
+ int count = 0;
+ for (i = 1; i < 256; i++) {
+ VmaDeviceInfoHeader *dih = &h->dev_info[i];
+ uint32_t devname_ptr = GUINT32_FROM_BE(dih->devname_ptr);
+ uint64_t size = GUINT64_FROM_BE(dih->size);
+ const char *devname = get_header_str(vmar, devname_ptr);
+
+ if (size && devname) {
+ count++;
+ vmar->devinfo[i].size = size;
+ vmar->devinfo[i].devname = devname;
+
+ if (strcmp(devname, "vmstate") == 0) {
+ vmar->vmstate_stream = i;
+ }
+ }
+ }
+
+ for (i = 0; i < VMA_MAX_CONFIGS; i++) {
+ uint32_t name_ptr = GUINT32_FROM_BE(h->config_names[i]);
+ uint32_t data_ptr = GUINT32_FROM_BE(h->config_data[i]);
+
+ if (!(name_ptr && data_ptr)) {
+ continue;
+ }
+ const char *name = get_header_str(vmar, name_ptr);
+ const VmaBlob *blob = get_header_blob(vmar, data_ptr);
+
+ if (!(name && blob)) {
+ error_setg(errp, "vma contains invalid data pointers");
+ return -1;
+ }
+
+ VmaConfigData *cdata = g_new0(VmaConfigData, 1);
+ cdata->name = name;
+ cdata->data = blob->data;
+ cdata->len = blob->len;
+
+ vmar->cdata_list = g_list_append(vmar->cdata_list, cdata);
+ }
+
+ return ret;
+};
+
+VmaReader *vma_reader_create(const char *filename, Error **errp)
+{
+ assert(filename);
+ assert(errp);
+
+ VmaReader *vmar = g_new0(VmaReader, 1);
+
+ if (strcmp(filename, "-") == 0) {
+ vmar->fd = dup(0);
+ } else {
+ vmar->fd = open(filename, O_RDONLY);
+ }
+
+ if (vmar->fd < 0) {
+ error_setg(errp, "can't open file %s - %s\n", filename,
+ g_strerror(errno));
+ goto err;
+ }
+
+ vmar->md5csum = g_checksum_new(G_CHECKSUM_MD5);
+ if (!vmar->md5csum) {
+ error_setg(errp, "can't allocate cmsum\n");
+ goto err;
+ }
+
+ vmar->blob_hash = g_hash_table_new_full(g_int32_hash, g_int32_equal,
+ NULL, g_free);
+
+ if (vma_reader_read_head(vmar, errp) < 0) {
+ goto err;
+ }
+
+ return vmar;
+
+err:
+ if (vmar) {
+ vma_reader_destroy(vmar);
+ }
+
+ return NULL;
+}
+
+VmaHeader *vma_reader_get_header(VmaReader *vmar)
+{
+ assert(vmar);
+ assert(vmar->head_data);
+
+ return (VmaHeader *)(vmar->head_data);
+}
+
+GList *vma_reader_get_config_data(VmaReader *vmar)
+{
+ assert(vmar);
+ assert(vmar->head_data);
+
+ return vmar->cdata_list;
+}
+
+VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id)
+{
+ assert(vmar);
+ assert(dev_id);
+
+ if (vmar->devinfo[dev_id].size && vmar->devinfo[dev_id].devname) {
+ return &vmar->devinfo[dev_id];
+ }
+
+ return NULL;
+}
+
2017-08-07 10:10:07 +03:00
+static void allocate_rstate(VmaReader *vmar, guint8 dev_id,
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ BlockBackend *target, bool write_zeroes, bool skip)
2017-08-07 10:10:07 +03:00
+{
+ assert(vmar);
+ assert(dev_id);
+
+ vmar->rstate[dev_id].target = target;
+ vmar->rstate[dev_id].write_zeroes = write_zeroes;
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ vmar->rstate[dev_id].skip = skip;
2017-08-07 10:10:07 +03:00
+
+ int64_t size = vmar->devinfo[dev_id].size;
+
+ int64_t bitmap_size = (size/BDRV_SECTOR_SIZE) +
+ (VMA_CLUSTER_SIZE/BDRV_SECTOR_SIZE) * BITS_PER_LONG - 1;
+ bitmap_size /= (VMA_CLUSTER_SIZE/BDRV_SECTOR_SIZE) * BITS_PER_LONG;
+
+ vmar->rstate[dev_id].bitmap_size = bitmap_size;
+ vmar->rstate[dev_id].bitmap = g_new0(unsigned long, bitmap_size);
+
+ vmar->cluster_count += size/VMA_CLUSTER_SIZE;
+}
+
+int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id, BlockBackend *target,
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ bool write_zeroes, bool skip, Error **errp)
2017-04-05 11:49:19 +03:00
+{
+ assert(vmar);
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ assert(target != NULL || skip);
2017-04-05 11:49:19 +03:00
+ assert(dev_id);
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ assert(vmar->rstate[dev_id].target == NULL && !vmar->rstate[dev_id].skip);
2017-04-05 11:49:19 +03:00
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (target != NULL) {
+ int64_t size = blk_getlength(target);
+ int64_t size_diff = size - vmar->devinfo[dev_id].size;
+
+ /* storage types can have different size restrictions, so it
+ * is not always possible to create an image with exact size.
+ * So we tolerate a size difference up to 4MB.
+ */
+ if ((size_diff < 0) || (size_diff > 4*1024*1024)) {
+ error_setg(errp, "vma_reader_register_bs for stream %s failed - "
+ "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
+ size, vmar->devinfo[dev_id].size);
+ return -1;
+ }
2017-04-05 11:49:19 +03:00
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ allocate_rstate(vmar, dev_id, target, write_zeroes, skip);
2017-04-05 11:49:19 +03:00
+
+ return 0;
+}
+
+static ssize_t safe_write(int fd, void *buf, size_t count)
+{
+ ssize_t n;
+
+ do {
+ n = write(fd, buf, count);
+ } while (n < 0 && errno == EINTR);
+
+ return n;
+}
+
+static size_t full_write(int fd, void *buf, size_t len)
+{
+ ssize_t n;
+ size_t total;
+
+ total = 0;
+
+ while (len > 0) {
+ n = safe_write(fd, buf, len);
+ if (n < 0) {
+ return n;
+ }
+ buf += n;
+ total += n;
+ len -= n;
+ }
+
+ if (len) {
+ /* incomplete write ? */
+ return -1;
+ }
+
+ return total;
+}
+
+static int restore_write_data(VmaReader *vmar, guint8 dev_id,
2017-08-07 10:10:07 +03:00
+ BlockBackend *target, int vmstate_fd,
2017-04-05 11:49:19 +03:00
+ unsigned char *buf, int64_t sector_num,
+ int nb_sectors, Error **errp)
+{
+ assert(vmar);
+
+ if (dev_id == vmar->vmstate_stream) {
+ if (vmstate_fd >= 0) {
+ int len = nb_sectors * BDRV_SECTOR_SIZE;
+ int res = full_write(vmstate_fd, buf, len);
+ if (res < 0) {
+ error_setg(errp, "write vmstate failed %d", res);
+ return -1;
+ }
+ }
+ } else {
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
+ int res = blk_pwrite(target, sector_num * BDRV_SECTOR_SIZE, nb_sectors * BDRV_SECTOR_SIZE, buf, 0);
2017-04-05 11:49:19 +03:00
+ if (res < 0) {
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+ bdrv_graph_rdlock_main_loop();
2017-08-07 10:10:07 +03:00
+ error_setg(errp, "blk_pwrite to %s failed (%d)",
+ bdrv_get_device_name(blk_bs(target)), res);
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+ bdrv_graph_rdunlock_main_loop();
2017-04-05 11:49:19 +03:00
+ return -1;
+ }
+ }
+ return 0;
+}
2017-08-07 10:10:07 +03:00
+
2017-04-05 11:49:19 +03:00
+static int restore_extent(VmaReader *vmar, unsigned char *buf,
+ int extent_size, int vmstate_fd,
2017-08-07 10:10:07 +03:00
+ bool verbose, bool verify, Error **errp)
2017-04-05 11:49:19 +03:00
+{
+ assert(vmar);
+ assert(buf);
+
+ VmaExtentHeader *ehead = (VmaExtentHeader *)buf;
+ int start = VMA_EXTENT_HEADER_SIZE;
+ int i;
+
+ for (i = 0; i < VMA_BLOCKS_PER_EXTENT; i++) {
+ uint64_t block_info = GUINT64_FROM_BE(ehead->blockinfo[i]);
+ uint64_t cluster_num = block_info & 0xffffffff;
+ uint8_t dev_id = (block_info >> 32) & 0xff;
+ uint16_t mask = block_info >> (32+16);
+ int64_t max_sector;
+
+ if (!dev_id) {
+ continue;
+ }
+
+ VmaRestoreState *rstate = &vmar->rstate[dev_id];
2017-08-07 10:10:07 +03:00
+ BlockBackend *target = NULL;
2017-04-05 11:49:19 +03:00
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ bool skip = rstate->skip;
+
2017-04-05 11:49:19 +03:00
+ if (dev_id != vmar->vmstate_stream) {
2017-08-07 10:10:07 +03:00
+ target = rstate->target;
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (!verify && !target && !skip) {
2017-04-05 11:49:19 +03:00
+ error_setg(errp, "got wrong dev id %d", dev_id);
+ return -1;
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (!skip) {
+ if (vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "found duplicated cluster %zd for stream %s",
+ cluster_num, vmar->devinfo[dev_id].devname);
+ return -1;
+ }
+ vma_reader_set_bitmap(rstate, cluster_num, 1);
2017-04-05 11:49:19 +03:00
+ }
+
+ max_sector = vmar->devinfo[dev_id].size/BDRV_SECTOR_SIZE;
+ } else {
+ max_sector = G_MAXINT64;
+ if (cluster_num != vmar->vmstate_clusters) {
+ error_setg(errp, "found out of order vmstate data");
+ return -1;
+ }
+ vmar->vmstate_clusters++;
+ }
+
+ vmar->clusters_read++;
+
+ if (verbose) {
+ time_t duration = time(NULL) - vmar->start_time;
+ int percent = (vmar->clusters_read*100)/vmar->cluster_count;
+ if (percent != vmar->clusters_read_per) {
+ printf("progress %d%% (read %zd bytes, duration %zd sec)\n",
+ percent, vmar->clusters_read*VMA_CLUSTER_SIZE,
+ duration);
+ fflush(stdout);
+ vmar->clusters_read_per = percent;
+ }
+ }
+
+ /* try to write whole clusters to speedup restore */
+ if (mask == 0xffff) {
+ if ((start + VMA_CLUSTER_SIZE) > extent_size) {
+ error_setg(errp, "short vma extent - too many blocks");
+ return -1;
+ }
+ int64_t sector_num = (cluster_num * VMA_CLUSTER_SIZE) /
+ BDRV_SECTOR_SIZE;
+ int64_t end_sector = sector_num +
+ VMA_CLUSTER_SIZE/BDRV_SECTOR_SIZE;
+
+ if (end_sector > max_sector) {
+ end_sector = max_sector;
+ }
+
+ if (end_sector <= sector_num) {
+ error_setg(errp, "got wrong block address - write beyond end");
2017-04-05 11:49:19 +03:00
+ return -1;
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (!verify && !skip) {
2017-08-07 10:10:07 +03:00
+ int nb_sectors = end_sector - sector_num;
+ if (restore_write_data(vmar, dev_id, target, vmstate_fd,
+ buf + start, sector_num, nb_sectors,
+ errp) < 0) {
+ return -1;
+ }
2017-04-05 11:49:19 +03:00
+ }
+
+ start += VMA_CLUSTER_SIZE;
+ } else {
+ int j;
+ int bit = 1;
+
+ for (j = 0; j < 16; j++) {
+ int64_t sector_num = (cluster_num*VMA_CLUSTER_SIZE +
+ j*VMA_BLOCK_SIZE)/BDRV_SECTOR_SIZE;
+
+ int64_t end_sector = sector_num +
+ VMA_BLOCK_SIZE/BDRV_SECTOR_SIZE;
+ if (end_sector > max_sector) {
+ end_sector = max_sector;
+ }
+
+ if (mask & bit) {
+ if ((start + VMA_BLOCK_SIZE) > extent_size) {
+ error_setg(errp, "short vma extent - too many blocks");
+ return -1;
+ }
+
+ if (end_sector <= sector_num) {
+ error_setg(errp, "got wrong block address - "
+ "write beyond end");
2017-04-05 11:49:19 +03:00
+ return -1;
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (!verify && !skip) {
2017-08-07 10:10:07 +03:00
+ int nb_sectors = end_sector - sector_num;
+ if (restore_write_data(vmar, dev_id, target, vmstate_fd,
+ buf + start, sector_num,
+ nb_sectors, errp) < 0) {
+ return -1;
+ }
2017-04-05 11:49:19 +03:00
+ }
+
+ start += VMA_BLOCK_SIZE;
+
+ } else {
+
2017-08-07 10:10:07 +03:00
+
+ if (end_sector > sector_num) {
2017-04-05 11:49:19 +03:00
+ /* Todo: use bdrv_co_write_zeroes (but that need to
+ * be run inside coroutine?)
+ */
+ int nb_sectors = end_sector - sector_num;
2017-08-07 10:10:07 +03:00
+ int zero_size = BDRV_SECTOR_SIZE*nb_sectors;
+ vmar->zero_cluster_data += zero_size;
+ if (mask != 0) {
+ vmar->partial_zero_cluster_data += zero_size;
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (rstate->write_zeroes && !verify && !skip) {
2017-08-07 10:10:07 +03:00
+ if (restore_write_data(vmar, dev_id, target, vmstate_fd,
+ zero_vma_block, sector_num,
+ nb_sectors, errp) < 0) {
+ return -1;
+ }
2017-04-05 11:49:19 +03:00
+ }
+ }
+ }
+
+ bit = bit << 1;
+ }
+ }
+ }
+
+ if (start != extent_size) {
+ error_setg(errp, "vma extent error - missing blocks");
+ return -1;
+ }
+
+ return 0;
+}
+
2017-08-07 10:10:07 +03:00
+static int vma_reader_restore_full(VmaReader *vmar, int vmstate_fd,
+ bool verbose, bool verify,
+ Error **errp)
2017-04-05 11:49:19 +03:00
+{
+ assert(vmar);
+ assert(vmar->head_data);
+
+ int ret = 0;
+ unsigned char buf[VMA_MAX_EXTENT_SIZE];
+ int buf_pos = 0;
+ unsigned char md5sum[16];
+ VmaHeader *h = (VmaHeader *)vmar->head_data;
+
+ vmar->start_time = time(NULL);
+
+ while (1) {
+ int bytes = full_read(vmar->fd, buf + buf_pos, sizeof(buf) - buf_pos);
+ if (bytes < 0) {
+ error_setg(errp, "read failed - %s", g_strerror(errno));
+ return -1;
+ }
+
+ buf_pos += bytes;
+
+ if (!buf_pos) {
+ break; /* EOF */
+ }
+
+ if (buf_pos < VMA_EXTENT_HEADER_SIZE) {
+ error_setg(errp, "read short extent (%d bytes)", buf_pos);
+ return -1;
+ }
+
+ VmaExtentHeader *ehead = (VmaExtentHeader *)buf;
+
+ /* extract md5sum */
+ memcpy(md5sum, ehead->md5sum, sizeof(ehead->md5sum));
+ memset(ehead->md5sum, 0, sizeof(ehead->md5sum));
+
+ g_checksum_reset(vmar->md5csum);
+ g_checksum_update(vmar->md5csum, buf, VMA_EXTENT_HEADER_SIZE);
+ gsize csize = 16;
+ g_checksum_get_digest(vmar->md5csum, ehead->md5sum, &csize);
+
+ if (memcmp(md5sum, ehead->md5sum, 16) != 0) {
+ error_setg(errp, "wrong vma extent header chechsum");
+ return -1;
+ }
+
+ if (memcmp(h->uuid, ehead->uuid, sizeof(ehead->uuid)) != 0) {
+ error_setg(errp, "wrong vma extent uuid");
+ return -1;
+ }
+
+ if (ehead->magic != VMA_EXTENT_MAGIC || ehead->reserved1 != 0) {
+ error_setg(errp, "wrong vma extent header magic");
+ return -1;
+ }
+
+ int block_count = GUINT16_FROM_BE(ehead->block_count);
+ int extent_size = VMA_EXTENT_HEADER_SIZE + block_count*VMA_BLOCK_SIZE;
+
+ if (buf_pos < extent_size) {
+ error_setg(errp, "short vma extent (%d < %d)", buf_pos,
+ extent_size);
+ return -1;
+ }
+
+ if (restore_extent(vmar, buf, extent_size, vmstate_fd, verbose,
2017-08-07 10:10:07 +03:00
+ verify, errp) < 0) {
2017-04-05 11:49:19 +03:00
+ return -1;
+ }
+
+ if (buf_pos > extent_size) {
+ memmove(buf, buf + extent_size, buf_pos - extent_size);
+ buf_pos = buf_pos - extent_size;
+ } else {
+ buf_pos = 0;
+ }
+ }
+
+ bdrv_drain_all();
+
+ int i;
+ for (i = 1; i < 256; i++) {
+ VmaRestoreState *rstate = &vmar->rstate[i];
2017-08-07 10:10:07 +03:00
+ if (!rstate->target) {
2017-04-05 11:49:19 +03:00
+ continue;
+ }
+
2017-08-07 10:10:07 +03:00
+ if (blk_flush(rstate->target) < 0) {
+ error_setg(errp, "vma blk_flush %s failed",
2017-04-05 11:49:19 +03:00
+ vmar->devinfo[i].devname);
+ return -1;
+ }
+
+ if (vmar->devinfo[i].size &&
+ (strcmp(vmar->devinfo[i].devname, "vmstate") != 0)) {
+ assert(rstate->bitmap);
+
+ int64_t cluster_num, end;
+
+ end = (vmar->devinfo[i].size + VMA_CLUSTER_SIZE - 1) /
+ VMA_CLUSTER_SIZE;
+
+ for (cluster_num = 0; cluster_num < end; cluster_num++) {
+ if (!vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "detected missing cluster %zd "
+ "for stream %s", cluster_num,
+ vmar->devinfo[i].devname);
+ return -1;
+ }
+ }
+ }
+ }
+
2017-08-07 10:10:07 +03:00
+ if (verbose) {
+ if (vmar->clusters_read) {
+ printf("total bytes read %zd, sparse bytes %zd (%.3g%%)\n",
+ vmar->clusters_read*VMA_CLUSTER_SIZE,
+ vmar->zero_cluster_data,
+ (double)(100.0*vmar->zero_cluster_data)/
+ (vmar->clusters_read*VMA_CLUSTER_SIZE));
+
+ int64_t datasize = vmar->clusters_read*VMA_CLUSTER_SIZE-vmar->zero_cluster_data;
+ if (datasize) { // this does not make sense for empty files
+ printf("space reduction due to 4K zero blocks %.3g%%\n",
+ (double)(100.0*vmar->partial_zero_cluster_data) / datasize);
+ }
+ } else {
+ printf("vma archive contains no image data\n");
+ }
+ }
2017-04-05 11:49:19 +03:00
+ return ret;
+}
+
2017-08-07 10:10:07 +03:00
+int vma_reader_restore(VmaReader *vmar, int vmstate_fd, bool verbose,
+ Error **errp)
+{
+ return vma_reader_restore_full(vmar, vmstate_fd, verbose, false, errp);
+}
+
+int vma_reader_verify(VmaReader *vmar, bool verbose, Error **errp)
+{
+ guint8 dev_id;
+
+ for (dev_id = 1; dev_id < 255; dev_id++) {
+ if (vma_reader_get_device_info(vmar, dev_id)) {
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ allocate_rstate(vmar, dev_id, NULL, false, false);
2017-08-07 10:10:07 +03:00
+ }
+ }
+
+ return vma_reader_restore_full(vmar, -1, verbose, true, errp);
+}
+
2017-04-05 11:49:19 +03:00
diff --git a/vma-writer.c b/vma-writer.c
new file mode 100644
index 0000000000..126b296647
2017-04-05 11:49:19 +03:00
--- /dev/null
+++ b/vma-writer.c
@@ -0,0 +1,818 @@
2017-04-05 11:49:19 +03:00
+/*
+ * VMA: Virtual Machine Archive
+ *
+ * Copyright (C) 2012 Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include <glib.h>
+#include <linux/magic.h>
+#include <sys/vfs.h>
2017-04-05 11:49:19 +03:00
+#include <uuid/uuid.h>
+
+#include "vma.h"
+#include "block/block.h"
+#include "monitor/monitor.h"
+#include "qemu/main-loop.h"
+#include "qemu/coroutine.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "qemu/memalign.h"
2017-04-05 11:49:19 +03:00
+
+#define DEBUG_VMA 0
+
+#define DPRINTF(fmt, ...)\
+ do { if (DEBUG_VMA) { printf("vma: " fmt, ## __VA_ARGS__); } } while (0)
+
+#define WRITE_BUFFERS 5
2017-08-07 10:10:07 +03:00
+#define HEADER_CLUSTERS 8
+#define HEADERBUF_SIZE (VMA_CLUSTER_SIZE*HEADER_CLUSTERS)
2017-04-05 11:49:19 +03:00
+
+struct VmaWriter {
+ int fd;
+ FILE *cmd;
+ int status;
+ char errmsg[8192];
+ uuid_t uuid;
+ bool header_written;
+ bool closed;
+
+ /* we always write extents */
2017-08-07 10:10:07 +03:00
+ unsigned char *outbuf;
2017-04-05 11:49:19 +03:00
+ int outbuf_pos; /* in bytes */
+ int outbuf_count; /* in VMA_BLOCKS */
+ uint64_t outbuf_block_info[VMA_BLOCKS_PER_EXTENT];
+
2017-08-07 10:10:07 +03:00
+ unsigned char *headerbuf;
2017-04-05 11:49:19 +03:00
+
+ GChecksum *md5csum;
+ CoMutex flush_lock;
+ Coroutine *co_writer;
+
+ /* drive informations */
+ VmaStreamInfo stream_info[256];
+ guint stream_count;
+
+ guint8 vmstate_stream;
+ uint32_t vmstate_clusters;
+
+ /* header blob table */
+ char *header_blob_table;
+ uint32_t header_blob_table_size;
+ uint32_t header_blob_table_pos;
+
+ /* store for config blobs */
+ uint32_t config_names[VMA_MAX_CONFIGS]; /* offset into blob_buffer table */
+ uint32_t config_data[VMA_MAX_CONFIGS]; /* offset into blob_buffer table */
+ uint32_t config_count;
+};
+
+void vma_writer_set_error(VmaWriter *vmaw, const char *fmt, ...)
+{
+ va_list ap;
+
+ if (vmaw->status < 0) {
+ return;
+ }
+
+ vmaw->status = -1;
+
+ va_start(ap, fmt);
+ g_vsnprintf(vmaw->errmsg, sizeof(vmaw->errmsg), fmt, ap);
+ va_end(ap);
+
+ DPRINTF("vma_writer_set_error: %s\n", vmaw->errmsg);
+}
+
+static uint32_t allocate_header_blob(VmaWriter *vmaw, const char *data,
+ size_t len)
+{
+ if (len > 65535) {
+ return 0;
+ }
+
+ if (!vmaw->header_blob_table ||
+ (vmaw->header_blob_table_size <
+ (vmaw->header_blob_table_pos + len + 2))) {
+ int newsize = vmaw->header_blob_table_size + ((len + 2 + 511)/512)*512;
+
+ vmaw->header_blob_table = g_realloc(vmaw->header_blob_table, newsize);
+ memset(vmaw->header_blob_table + vmaw->header_blob_table_size,
+ 0, newsize - vmaw->header_blob_table_size);
+ vmaw->header_blob_table_size = newsize;
+ }
+
+ uint32_t cpos = vmaw->header_blob_table_pos;
+ vmaw->header_blob_table[cpos] = len & 255;
+ vmaw->header_blob_table[cpos+1] = (len >> 8) & 255;
+ memcpy(vmaw->header_blob_table + cpos + 2, data, len);
+ vmaw->header_blob_table_pos += len + 2;
+ return cpos;
+}
+
+static uint32_t allocate_header_string(VmaWriter *vmaw, const char *str)
+{
+ assert(vmaw);
+
+ size_t len = strlen(str) + 1;
+
+ return allocate_header_blob(vmaw, str, len);
+}
+
+int vma_writer_add_config(VmaWriter *vmaw, const char *name, gpointer data,
+ gsize len)
+{
+ assert(vmaw);
+ assert(!vmaw->header_written);
+ assert(vmaw->config_count < VMA_MAX_CONFIGS);
+ assert(name);
+ assert(data);
+
+ gchar *basename = g_path_get_basename(name);
+ uint32_t name_ptr = allocate_header_string(vmaw, basename);
+ g_free(basename);
+
+ if (!name_ptr) {
+ return -1;
+ }
+
+ uint32_t data_ptr = allocate_header_blob(vmaw, data, len);
+ if (!data_ptr) {
+ return -1;
+ }
+
+ vmaw->config_names[vmaw->config_count] = name_ptr;
+ vmaw->config_data[vmaw->config_count] = data_ptr;
+
+ vmaw->config_count++;
+
+ return 0;
+}
+
+int vma_writer_register_stream(VmaWriter *vmaw, const char *devname,
+ size_t size)
+{
+ assert(vmaw);
+ assert(devname);
+ assert(!vmaw->status);
+
+ if (vmaw->header_written) {
+ vma_writer_set_error(vmaw, "vma_writer_register_stream: header "
+ "already written");
+ return -1;
+ }
+
+ guint n = vmaw->stream_count + 1;
+
+ /* we can have dev_ids form 1 to 255 (0 reserved)
+ * 255(-1) reseverd for safety
+ */
+ if (n > 254) {
+ vma_writer_set_error(vmaw, "vma_writer_register_stream: "
+ "too many drives");
+ return -1;
+ }
+
+ if (size <= 0) {
+ vma_writer_set_error(vmaw, "vma_writer_register_stream: "
+ "got strange size %zd", size);
+ return -1;
+ }
+
+ DPRINTF("vma_writer_register_stream %s %zu %d\n", devname, size, n);
+
+ vmaw->stream_info[n].devname = g_strdup(devname);
+ vmaw->stream_info[n].size = size;
+
+ vmaw->stream_info[n].cluster_count = (size + VMA_CLUSTER_SIZE - 1) /
+ VMA_CLUSTER_SIZE;
+
+ vmaw->stream_count = n;
+
+ if (strcmp(devname, "vmstate") == 0) {
+ vmaw->vmstate_stream = n;
+ }
+
+ return n;
+}
+
+static void coroutine_fn yield_until_fd_writable(int fd)
2017-04-05 11:49:19 +03:00
+{
+ assert(qemu_in_coroutine());
+ AioContext *ctx = qemu_get_current_aio_context();
+ aio_set_fd_handler(ctx, fd, NULL, (IOHandler *)qemu_coroutine_enter, NULL,
+ NULL, qemu_coroutine_self());
+ qemu_coroutine_yield();
+ aio_set_fd_handler(ctx, fd, NULL, NULL, NULL, NULL, NULL);
2017-04-05 11:49:19 +03:00
+}
+
+static ssize_t coroutine_fn
2017-08-07 10:10:07 +03:00
+vma_queue_write(VmaWriter *vmaw, const void *buf, size_t bytes)
2017-04-05 11:49:19 +03:00
+{
2017-08-07 10:10:07 +03:00
+ DPRINTF("vma_queue_write enter %zd\n", bytes);
2017-04-05 11:49:19 +03:00
+
2017-08-07 10:10:07 +03:00
+ assert(vmaw);
+ assert(buf);
+ assert(bytes <= VMA_MAX_EXTENT_SIZE);
2017-04-05 11:49:19 +03:00
+
2017-08-07 10:10:07 +03:00
+ size_t done = 0;
+ ssize_t ret;
2017-04-05 11:49:19 +03:00
+
+ assert(vmaw->co_writer == NULL);
+
+ vmaw->co_writer = qemu_coroutine_self();
+
+ while (done < bytes) {
2017-08-07 10:10:07 +03:00
+ if (vmaw->status < 0) {
+ DPRINTF("vma_queue_write detected canceled backup\n");
+ done = -1;
+ break;
+ }
+ yield_until_fd_writable(vmaw->fd);
2017-04-05 11:49:19 +03:00
+ ret = write(vmaw->fd, buf + done, bytes - done);
+ if (ret > 0) {
+ done += ret;
2017-08-07 10:10:07 +03:00
+ DPRINTF("vma_queue_write written %zd %zd\n", done, ret);
2017-04-05 11:49:19 +03:00
+ } else if (ret < 0) {
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
2017-08-07 10:10:07 +03:00
+ /* try again */
+ } else {
+ vma_writer_set_error(vmaw, "vma_queue_write: write error - %s",
2017-04-05 11:49:19 +03:00
+ g_strerror(errno));
+ done = -1; /* always return failure for partial writes */
+ break;
+ }
+ } else if (ret == 0) {
+ /* should not happen - simply try again */
+ }
+ }
+
+ vmaw->co_writer = NULL;
+
2017-08-07 10:10:07 +03:00
+ return (done == bytes) ? bytes : -1;
2017-04-05 11:49:19 +03:00
+}
+
+static bool is_path_tmpfs(const char *path) {
+ struct statfs fs;
+ int ret;
+
+ do {
+ ret = statfs(path, &fs);
+ } while (ret != 0 && errno == EINTR);
+
+ if (ret != 0) {
+ warn_report("statfs call for %s failed, assuming not tmpfs - %s\n",
+ path, strerror(errno));
+ return false;
+ }
+
+ return fs.f_type == TMPFS_MAGIC;
+}
+
2017-04-05 11:49:19 +03:00
+VmaWriter *vma_writer_create(const char *filename, uuid_t uuid, Error **errp)
+{
+ const char *p;
+
+ assert(sizeof(VmaHeader) == (4096 + 8192));
+ assert(G_STRUCT_OFFSET(VmaHeader, config_names) == 2044);
+ assert(G_STRUCT_OFFSET(VmaHeader, config_data) == 3068);
+ assert(G_STRUCT_OFFSET(VmaHeader, dev_info) == 4096);
+ assert(sizeof(VmaExtentHeader) == 512);
+
+ VmaWriter *vmaw = g_new0(VmaWriter, 1);
+ vmaw->fd = -1;
+
+ vmaw->md5csum = g_checksum_new(G_CHECKSUM_MD5);
+ if (!vmaw->md5csum) {
+ error_setg(errp, "can't allocate cmsum\n");
+ goto err;
+ }
+
+ if (strstart(filename, "exec:", &p)) {
+ vmaw->cmd = popen(p, "w");
+ if (vmaw->cmd == NULL) {
+ error_setg(errp, "can't popen command '%s' - %s\n", p,
+ g_strerror(errno));
+ goto err;
+ }
+ vmaw->fd = fileno(vmaw->cmd);
+
2017-08-07 10:10:07 +03:00
+ /* try to use O_NONBLOCK */
2017-04-05 11:49:19 +03:00
+ fcntl(vmaw->fd, F_SETFL, fcntl(vmaw->fd, F_GETFL)|O_NONBLOCK);
+
+ } else {
+ struct stat st;
+ int oflags;
+ const char *tmp_id_str;
+
+ if ((stat(filename, &st) == 0) && S_ISFIFO(st.st_mode)) {
2017-08-07 10:10:07 +03:00
+ oflags = O_NONBLOCK|O_WRONLY;
+ vmaw->fd = qemu_open(filename, oflags, errp);
2017-04-05 11:49:19 +03:00
+ } else if (strstart(filename, "/dev/fdset/", &tmp_id_str)) {
2017-08-07 10:10:07 +03:00
+ oflags = O_NONBLOCK|O_WRONLY;
+ vmaw->fd = qemu_open(filename, oflags, errp);
2017-04-05 11:49:19 +03:00
+ } else if (strstart(filename, "/dev/fdname/", &tmp_id_str)) {
+ vmaw->fd = monitor_get_fd(monitor_cur(), tmp_id_str, errp);
2017-04-05 11:49:19 +03:00
+ if (vmaw->fd < 0) {
+ goto err;
+ }
2017-08-07 10:10:07 +03:00
+ /* try to use O_NONBLOCK */
2017-04-05 11:49:19 +03:00
+ fcntl(vmaw->fd, F_SETFL, fcntl(vmaw->fd, F_GETFL)|O_NONBLOCK);
+ } else {
+ gchar *dirname = g_path_get_dirname(filename);
+ oflags = O_NONBLOCK|O_WRONLY|O_EXCL;
+ if (!is_path_tmpfs(dirname)) {
+ oflags |= O_DIRECT;
+ }
+ g_free(dirname);
+ vmaw->fd = qemu_create(filename, oflags, 0644, errp);
2017-04-05 11:49:19 +03:00
+ }
+
+ if (vmaw->fd < 0) {
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ error_free(*errp);
+ *errp = NULL;
2017-04-05 11:49:19 +03:00
+ error_setg(errp, "can't open file %s - %s\n", filename,
+ g_strerror(errno));
+ goto err;
+ }
+ }
+
+ /* we use O_DIRECT, so we need to align IO buffers */
2017-08-07 10:10:07 +03:00
+
+ vmaw->outbuf = qemu_memalign(512, VMA_MAX_EXTENT_SIZE);
+ vmaw->headerbuf = qemu_memalign(512, HEADERBUF_SIZE);
2017-04-05 11:49:19 +03:00
+
+ vmaw->outbuf_count = 0;
+ vmaw->outbuf_pos = VMA_EXTENT_HEADER_SIZE;
+
+ vmaw->header_blob_table_pos = 1; /* start at pos 1 */
+
+ qemu_co_mutex_init(&vmaw->flush_lock);
+
+ uuid_copy(vmaw->uuid, uuid);
+
+ return vmaw;
+
+err:
+ if (vmaw) {
+ if (vmaw->cmd) {
+ pclose(vmaw->cmd);
+ } else if (vmaw->fd >= 0) {
+ close(vmaw->fd);
+ }
+
+ if (vmaw->md5csum) {
+ g_checksum_free(vmaw->md5csum);
+ }
+
+ g_free(vmaw);
+ }
+
+ return NULL;
+}
+
+static int coroutine_fn vma_write_header(VmaWriter *vmaw)
+{
+ assert(vmaw);
2017-08-07 10:10:07 +03:00
+ unsigned char *buf = vmaw->headerbuf;
2017-04-05 11:49:19 +03:00
+ VmaHeader *head = (VmaHeader *)buf;
+
+ int i;
+
+ DPRINTF("VMA WRITE HEADER\n");
+
+ if (vmaw->status < 0) {
+ return vmaw->status;
+ }
+
2017-08-07 10:10:07 +03:00
+ memset(buf, 0, HEADERBUF_SIZE);
2017-04-05 11:49:19 +03:00
+
+ head->magic = VMA_MAGIC;
+ head->version = GUINT32_TO_BE(1); /* v1 */
+ memcpy(head->uuid, vmaw->uuid, 16);
+
+ time_t ctime = time(NULL);
+ head->ctime = GUINT64_TO_BE(ctime);
+
+ for (i = 0; i < VMA_MAX_CONFIGS; i++) {
+ head->config_names[i] = GUINT32_TO_BE(vmaw->config_names[i]);
+ head->config_data[i] = GUINT32_TO_BE(vmaw->config_data[i]);
+ }
+
+ /* 32 bytes per device (12 used currently) = 8192 bytes max */
+ for (i = 1; i <= 254; i++) {
+ VmaStreamInfo *si = &vmaw->stream_info[i];
+ if (si->size) {
+ assert(si->devname);
+ uint32_t devname_ptr = allocate_header_string(vmaw, si->devname);
+ if (!devname_ptr) {
+ return -1;
+ }
+ head->dev_info[i].devname_ptr = GUINT32_TO_BE(devname_ptr);
+ head->dev_info[i].size = GUINT64_TO_BE(si->size);
+ }
+ }
+
+ uint32_t header_size = sizeof(VmaHeader) + vmaw->header_blob_table_size;
+ head->header_size = GUINT32_TO_BE(header_size);
+
2017-08-07 10:10:07 +03:00
+ if (header_size > HEADERBUF_SIZE) {
2017-04-05 11:49:19 +03:00
+ return -1; /* just to be sure */
+ }
+
+ uint32_t blob_buffer_offset = sizeof(VmaHeader);
+ memcpy(buf + blob_buffer_offset, vmaw->header_blob_table,
+ vmaw->header_blob_table_size);
+ head->blob_buffer_offset = GUINT32_TO_BE(blob_buffer_offset);
+ head->blob_buffer_size = GUINT32_TO_BE(vmaw->header_blob_table_pos);
+
+ g_checksum_reset(vmaw->md5csum);
+ g_checksum_update(vmaw->md5csum, (const guchar *)buf, header_size);
+ gsize csize = 16;
+ g_checksum_get_digest(vmaw->md5csum, (guint8 *)(head->md5sum), &csize);
+
+ return vma_queue_write(vmaw, buf, header_size);
+}
+
+static int coroutine_fn vma_writer_flush(VmaWriter *vmaw)
+{
+ assert(vmaw);
+
+ int ret;
+ int i;
+
+ if (vmaw->status < 0) {
+ return vmaw->status;
+ }
+
+ if (!vmaw->header_written) {
+ vmaw->header_written = true;
+ ret = vma_write_header(vmaw);
+ if (ret < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_flush: write header failed");
+ return ret;
+ }
+ }
+
+ DPRINTF("VMA WRITE FLUSH %d %d\n", vmaw->outbuf_count, vmaw->outbuf_pos);
+
+
+ VmaExtentHeader *ehead = (VmaExtentHeader *)vmaw->outbuf;
+
+ ehead->magic = VMA_EXTENT_MAGIC;
+ ehead->reserved1 = 0;
+
+ for (i = 0; i < VMA_BLOCKS_PER_EXTENT; i++) {
+ ehead->blockinfo[i] = GUINT64_TO_BE(vmaw->outbuf_block_info[i]);
+ }
+
+ guint16 block_count = (vmaw->outbuf_pos - VMA_EXTENT_HEADER_SIZE) /
+ VMA_BLOCK_SIZE;
+
+ ehead->block_count = GUINT16_TO_BE(block_count);
+
+ memcpy(ehead->uuid, vmaw->uuid, sizeof(ehead->uuid));
+ memset(ehead->md5sum, 0, sizeof(ehead->md5sum));
+
+ g_checksum_reset(vmaw->md5csum);
+ g_checksum_update(vmaw->md5csum, vmaw->outbuf, VMA_EXTENT_HEADER_SIZE);
+ gsize csize = 16;
+ g_checksum_get_digest(vmaw->md5csum, ehead->md5sum, &csize);
+
+ int bytes = vmaw->outbuf_pos;
+ ret = vma_queue_write(vmaw, vmaw->outbuf, bytes);
+ if (ret != bytes) {
+ vma_writer_set_error(vmaw, "vma_writer_flush: failed write");
+ }
+
+ vmaw->outbuf_count = 0;
+ vmaw->outbuf_pos = VMA_EXTENT_HEADER_SIZE;
+
+ for (i = 0; i < VMA_BLOCKS_PER_EXTENT; i++) {
+ vmaw->outbuf_block_info[i] = 0;
+ }
+
+ return vmaw->status;
+}
+
+static int vma_count_open_streams(VmaWriter *vmaw)
+{
+ g_assert(vmaw != NULL);
+
+ int i;
+ int open_drives = 0;
+ for (i = 0; i <= 255; i++) {
+ if (vmaw->stream_info[i].size && !vmaw->stream_info[i].finished) {
+ open_drives++;
+ }
+ }
+
+ return open_drives;
+}
+
2017-08-07 10:10:07 +03:00
+
+/**
+ * You need to call this if the vma archive does not contain
+ * any data stream.
+ */
+int coroutine_fn
+vma_writer_flush_output(VmaWriter *vmaw)
+{
+ qemu_co_mutex_lock(&vmaw->flush_lock);
+ int ret = vma_writer_flush(vmaw);
+ qemu_co_mutex_unlock(&vmaw->flush_lock);
+ if (ret < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_flush_header failed");
+ }
+ return ret;
+}
+
2017-04-05 11:49:19 +03:00
+/**
+ * all jobs should call this when there is no more data
+ * Returns: number of remaining stream (0 ==> finished)
+ */
+int coroutine_fn
+vma_writer_close_stream(VmaWriter *vmaw, uint8_t dev_id)
+{
+ g_assert(vmaw != NULL);
+
+ DPRINTF("vma_writer_set_status %d\n", dev_id);
+ if (!vmaw->stream_info[dev_id].size) {
+ vma_writer_set_error(vmaw, "vma_writer_close_stream: "
+ "no such stream %d", dev_id);
+ return -1;
+ }
+ if (vmaw->stream_info[dev_id].finished) {
+ vma_writer_set_error(vmaw, "vma_writer_close_stream: "
+ "stream already closed %d", dev_id);
+ return -1;
+ }
+
+ vmaw->stream_info[dev_id].finished = true;
+
+ int open_drives = vma_count_open_streams(vmaw);
+
+ if (open_drives <= 0) {
+ DPRINTF("vma_writer_set_status all drives completed\n");
2017-08-07 10:10:07 +03:00
+ vma_writer_flush_output(vmaw);
2017-04-05 11:49:19 +03:00
+ }
+
+ return open_drives;
+}
+
+int vma_writer_get_status(VmaWriter *vmaw, VmaStatus *status)
+{
+ int i;
+
+ g_assert(vmaw != NULL);
+
+ if (status) {
+ status->status = vmaw->status;
+ g_strlcpy(status->errmsg, vmaw->errmsg, sizeof(status->errmsg));
+ for (i = 0; i <= 255; i++) {
+ status->stream_info[i] = vmaw->stream_info[i];
+ }
+
+ uuid_unparse_lower(vmaw->uuid, status->uuid_str);
+ }
+
+ status->closed = vmaw->closed;
+
+ return vmaw->status;
+}
+
+static int vma_writer_get_buffer(VmaWriter *vmaw)
+{
+ int ret = 0;
+
+ qemu_co_mutex_lock(&vmaw->flush_lock);
+
+ /* wait until buffer is available */
+ while (vmaw->outbuf_count >= (VMA_BLOCKS_PER_EXTENT - 1)) {
+ ret = vma_writer_flush(vmaw);
+ if (ret < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_get_buffer: flush failed");
+ break;
+ }
+ }
+
+ qemu_co_mutex_unlock(&vmaw->flush_lock);
+
+ return ret;
+}
+
+
+int64_t coroutine_fn
+vma_writer_write(VmaWriter *vmaw, uint8_t dev_id, int64_t cluster_num,
+ const unsigned char *buf, size_t *zero_bytes)
2017-04-05 11:49:19 +03:00
+{
+ g_assert(vmaw != NULL);
+ g_assert(zero_bytes != NULL);
+
+ *zero_bytes = 0;
+
+ if (vmaw->status < 0) {
+ return vmaw->status;
+ }
+
+ if (!dev_id || !vmaw->stream_info[dev_id].size) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "no such stream %d", dev_id);
+ return -1;
+ }
+
+ if (vmaw->stream_info[dev_id].finished) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "stream already closed %d", dev_id);
+ return -1;
+ }
+
+
+ if (cluster_num >= (((uint64_t)1)<<32)) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "cluster number out of range");
+ return -1;
+ }
+
+ if (dev_id == vmaw->vmstate_stream) {
+ if (cluster_num != vmaw->vmstate_clusters) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "non sequential vmstate write");
+ }
+ vmaw->vmstate_clusters++;
+ } else if (cluster_num >= vmaw->stream_info[dev_id].cluster_count) {
+ vma_writer_set_error(vmaw, "vma_writer_write: cluster number too big");
+ return -1;
+ }
+
+ /* wait until buffer is available */
+ if (vma_writer_get_buffer(vmaw) < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "vma_writer_get_buffer failed");
+ return -1;
+ }
+
+ DPRINTF("VMA WRITE %d %zd\n", dev_id, cluster_num);
+
+ uint64_t dev_size = vmaw->stream_info[dev_id].size;
2017-04-05 11:49:19 +03:00
+ uint16_t mask = 0;
+
+ if (buf) {
+ int i;
+ int bit = 1;
+ uint64_t byte_offset = cluster_num * VMA_CLUSTER_SIZE;
2017-04-05 11:49:19 +03:00
+ for (i = 0; i < 16; i++) {
+ const unsigned char *vmablock = buf + (i*VMA_BLOCK_SIZE);
+
+ // Note: If the source is not 64k-aligned, we might reach 4k blocks
+ // after the end of the device. Always mark these as zero in the
+ // mask, so the restore handles them correctly.
+ if (byte_offset < dev_size &&
+ !buffer_is_zero(vmablock, VMA_BLOCK_SIZE))
+ {
2017-04-05 11:49:19 +03:00
+ mask |= bit;
+ memcpy(vmaw->outbuf + vmaw->outbuf_pos, vmablock,
+ VMA_BLOCK_SIZE);
+
+ // prevent memory leakage on unaligned last block
+ if (byte_offset + VMA_BLOCK_SIZE > dev_size) {
+ uint64_t real_data_in_block = dev_size - byte_offset;
+ memset(vmaw->outbuf + vmaw->outbuf_pos + real_data_in_block,
+ 0, VMA_BLOCK_SIZE - real_data_in_block);
+ }
+
2017-04-05 11:49:19 +03:00
+ vmaw->outbuf_pos += VMA_BLOCK_SIZE;
+ } else {
+ DPRINTF("VMA WRITE %zd ZERO BLOCK %d\n", cluster_num, i);
+ vmaw->stream_info[dev_id].zero_bytes += VMA_BLOCK_SIZE;
+ *zero_bytes += VMA_BLOCK_SIZE;
+ }
+
+ byte_offset += VMA_BLOCK_SIZE;
2017-04-05 11:49:19 +03:00
+ bit = bit << 1;
+ }
+ } else {
+ DPRINTF("VMA WRITE %zd ZERO CLUSTER\n", cluster_num);
+ vmaw->stream_info[dev_id].zero_bytes += VMA_CLUSTER_SIZE;
+ *zero_bytes += VMA_CLUSTER_SIZE;
+ }
+
+ uint64_t block_info = ((uint64_t)mask) << (32+16);
+ block_info |= ((uint64_t)dev_id) << 32;
+ block_info |= (cluster_num & 0xffffffff);
+ vmaw->outbuf_block_info[vmaw->outbuf_count] = block_info;
+
+ DPRINTF("VMA WRITE MASK %zd %zx\n", cluster_num, block_info);
+
+ vmaw->outbuf_count++;
+
+ /** NOTE: We allways write whole clusters, but we correctly set
+ * transferred bytes. So transferred == size when when everything
+ * went OK.
+ */
+ size_t transferred = VMA_CLUSTER_SIZE;
+
+ if (dev_id != vmaw->vmstate_stream) {
+ uint64_t last = (cluster_num + 1) * VMA_CLUSTER_SIZE;
+ if (last > dev_size) {
+ uint64_t diff = last - dev_size;
2017-04-05 11:49:19 +03:00
+ if (diff >= VMA_CLUSTER_SIZE) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "read after last cluster");
+ return -1;
+ }
+ transferred -= diff;
+ }
+ }
+
+ vmaw->stream_info[dev_id].transferred += transferred;
+
+ return transferred;
+}
+
2017-08-07 10:10:07 +03:00
+void vma_writer_error_propagate(VmaWriter *vmaw, Error **errp)
+{
+ if (vmaw->status < 0 && *errp == NULL) {
+ error_setg(errp, "%s", vmaw->errmsg);
+ }
+}
+
2017-04-05 11:49:19 +03:00
+int vma_writer_close(VmaWriter *vmaw, Error **errp)
+{
+ g_assert(vmaw != NULL);
+
+ int i;
+
+ qemu_co_mutex_lock(&vmaw->flush_lock); // wait for pending writes
2017-04-05 11:49:19 +03:00
+
2017-08-07 10:10:07 +03:00
+ assert(vmaw->co_writer == NULL);
+
2017-04-05 11:49:19 +03:00
+ if (vmaw->cmd) {
+ if (pclose(vmaw->cmd) < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_close: "
+ "pclose failed - %s", g_strerror(errno));
+ }
+ } else {
+ if (close(vmaw->fd) < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_close: "
+ "close failed - %s", g_strerror(errno));
+ }
+ }
+
+ for (i = 0; i <= 255; i++) {
+ VmaStreamInfo *si = &vmaw->stream_info[i];
+ if (si->size) {
+ if (!si->finished) {
+ vma_writer_set_error(vmaw, "vma_writer_close: "
+ "detected open stream '%s'", si->devname);
+ } else if ((si->transferred != si->size) &&
+ (i != vmaw->vmstate_stream)) {
+ vma_writer_set_error(vmaw, "vma_writer_close: "
+ "incomplete stream '%s' (%zd != %zd)",
+ si->devname, si->transferred, si->size);
+ }
+ }
+ }
+
+ for (i = 0; i <= 255; i++) {
+ vmaw->stream_info[i].finished = 1; /* mark as closed */
+ }
+
+ vmaw->closed = 1;
+
+ if (vmaw->status < 0 && *errp == NULL) {
+ error_setg(errp, "%s", vmaw->errmsg);
+ }
+
+ qemu_co_mutex_unlock(&vmaw->flush_lock);
+
2017-04-05 11:49:19 +03:00
+ return vmaw->status;
+}
+
+void vma_writer_destroy(VmaWriter *vmaw)
+{
+ assert(vmaw);
+
+ int i;
+
+ for (i = 0; i <= 255; i++) {
+ if (vmaw->stream_info[i].devname) {
+ g_free(vmaw->stream_info[i].devname);
+ }
+ }
+
+ if (vmaw->md5csum) {
+ g_checksum_free(vmaw->md5csum);
+ }
+
+ qemu_vfree(vmaw->headerbuf);
+ qemu_vfree(vmaw->outbuf);
2017-04-05 11:49:19 +03:00
+ g_free(vmaw);
+}
diff --git a/vma.c b/vma.c
new file mode 100644
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
index 0000000000..bb715e9061
2017-04-05 11:49:19 +03:00
--- /dev/null
+++ b/vma.c
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
@@ -0,0 +1,901 @@
2017-04-05 11:49:19 +03:00
+/*
+ * VMA: Virtual Machine Archive
+ *
+ * Copyright (C) 2012-2013 Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include <glib.h>
+
+#include "vma.h"
+#include "qemu/module.h"
2017-04-05 11:49:19 +03:00
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
+#include "qemu/memalign.h"
+#include "qapi/qmp/qdict.h"
2017-08-07 10:10:07 +03:00
+#include "sysemu/block-backend.h"
2017-04-05 11:49:19 +03:00
+
+static void help(void)
+{
+ const char *help_msg =
+ "usage: vma command [command options]\n"
+ "\n"
+ "vma list <filename>\n"
2017-08-07 10:10:07 +03:00
+ "vma config <filename> [-c config]\n"
+ "vma create <filename> [-c config] pathname ...\n"
+ "vma extract <filename> [-d <drive-list>] [-r <fifo>] <targetdir>\n"
2017-08-07 10:10:07 +03:00
+ "vma verify <filename> [-v]\n"
2017-04-05 11:49:19 +03:00
+ ;
+
+ printf("%s", help_msg);
+ exit(1);
+}
+
+static const char *extract_devname(const char *path, char **devname, int index)
+{
+ assert(path);
+
+ const char *sep = strchr(path, '=');
+
+ if (sep) {
+ *devname = g_strndup(path, sep - path);
+ path = sep + 1;
+ } else {
+ if (index >= 0) {
+ *devname = g_strdup_printf("disk%d", index);
+ } else {
+ *devname = NULL;
+ }
+ }
+
+ return path;
+}
+
+static void print_content(VmaReader *vmar)
+{
+ assert(vmar);
+
+ VmaHeader *head = vma_reader_get_header(vmar);
+
+ GList *l = vma_reader_get_config_data(vmar);
+ while (l && l->data) {
+ VmaConfigData *cdata = (VmaConfigData *)l->data;
+ l = g_list_next(l);
+ printf("CFG: size: %d name: %s\n", cdata->len, cdata->name);
+ }
+
+ int i;
+ VmaDeviceInfo *di;
+ for (i = 1; i < 255; i++) {
+ di = vma_reader_get_device_info(vmar, i);
+ if (di) {
+ if (strcmp(di->devname, "vmstate") == 0) {
+ printf("VMSTATE: dev_id=%d memory: %zd\n", i, di->size);
+ } else {
+ printf("DEV: dev_id=%d size: %zd devname: %s\n",
+ i, di->size, di->devname);
+ }
+ }
+ }
+ /* ctime is the last entry we print */
+ printf("CTIME: %s", ctime(&head->ctime));
+ fflush(stdout);
+}
+
+static int list_content(int argc, char **argv)
+{
+ int c, ret = 0;
+ const char *filename;
+
+ for (;;) {
+ c = getopt(argc, argv, "h");
+ if (c == -1) {
+ break;
+ }
+ switch (c) {
+ case '?':
+ case 'h':
+ help();
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+
+ /* Get the filename */
+ if ((optind + 1) != argc) {
+ help();
+ }
+ filename = argv[optind++];
+
+ Error *errp = NULL;
+ VmaReader *vmar = vma_reader_create(filename, &errp);
+
+ if (!vmar) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ print_content(vmar);
+
+ vma_reader_destroy(vmar);
+
+ return ret;
+}
+
+typedef struct RestoreMap {
+ char *devname;
+ char *path;
2017-08-07 10:10:07 +03:00
+ char *format;
+ uint64_t throttling_bps;
+ char *throttling_group;
+ char *cache;
2017-04-05 11:49:19 +03:00
+ bool write_zero;
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ bool skip;
2017-04-05 11:49:19 +03:00
+} RestoreMap;
+
+static bool try_parse_option(char **line, const char *optname, char **out, const char *inbuf) {
+ size_t optlen = strlen(optname);
+ if (strncmp(*line, optname, optlen) != 0 || (*line)[optlen] != '=') {
+ return false;
+ }
+ if (*out) {
+ g_error("read map failed - duplicate value for option '%s'", optname);
+ }
+ char *value = (*line) + optlen + 1; /* including a '=' */
+ char *colon = strchr(value, ':');
+ if (!colon) {
+ g_error("read map failed - option '%s' not terminated ('%s')",
+ optname, inbuf);
+ }
+ *line = colon+1;
+ *out = g_strndup(value, colon - value);
+ return true;
+}
+
+static uint64_t verify_u64(const char *text) {
+ uint64_t value;
+ const char *endptr = NULL;
+ if (qemu_strtou64(text, &endptr, 0, &value) != 0 || !endptr || *endptr) {
+ g_error("read map failed - not a number: %s", text);
+ }
+ return value;
+}
+
2017-04-05 11:49:19 +03:00
+static int extract_content(int argc, char **argv)
+{
+ int c, ret = 0;
+ int verbose = 0;
+ const char *filename;
+ const char *dirname;
+ const char *readmap = NULL;
+ gchar **drive_list = NULL;
2017-04-05 11:49:19 +03:00
+
+ for (;;) {
+ c = getopt(argc, argv, "hvd:r:");
2017-04-05 11:49:19 +03:00
+ if (c == -1) {
+ break;
+ }
+ switch (c) {
+ case '?':
+ case 'h':
+ help();
+ break;
+ case 'd':
+ drive_list = g_strsplit(optarg, ",", 254);
+ break;
2017-04-05 11:49:19 +03:00
+ case 'r':
+ readmap = optarg;
+ break;
+ case 'v':
+ verbose = 1;
+ break;
+ default:
+ help();
+ }
+ }
+
+ /* Get the filename */
+ if ((optind + 2) != argc) {
+ help();
+ }
+ filename = argv[optind++];
+ dirname = argv[optind++];
+
+ Error *errp = NULL;
+ VmaReader *vmar = vma_reader_create(filename, &errp);
+
+ if (!vmar) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ if (mkdir(dirname, 0777) < 0) {
+ g_error("unable to create target directory %s - %s",
+ dirname, g_strerror(errno));
+ }
+
+ GList *l = vma_reader_get_config_data(vmar);
+ while (l && l->data) {
+ VmaConfigData *cdata = (VmaConfigData *)l->data;
+ l = g_list_next(l);
+ char *cfgfn = g_strdup_printf("%s/%s", dirname, cdata->name);
+ GError *err = NULL;
+ if (!g_file_set_contents(cfgfn, (gchar *)cdata->data, cdata->len,
+ &err)) {
+ g_error("unable to write file: %s", err->message);
+ }
+ }
+
+ GHashTable *devmap = g_hash_table_new(g_str_hash, g_str_equal);
+
+ if (readmap) {
+ print_content(vmar);
+
+ FILE *map = fopen(readmap, "r");
+ if (!map) {
+ g_error("unable to open fifo %s - %s", readmap, g_strerror(errno));
+ }
+
+ while (1) {
+ char inbuf[8192];
+ char *line = fgets(inbuf, sizeof(inbuf), map);
+ char *format = NULL;
+ char *bps = NULL;
+ char *group = NULL;
+ char *cache = NULL;
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ char *devname = NULL;
+ bool skip = false;
+ uint64_t bps_value = 0;
+ const char *path = NULL;
+ bool write_zero = true;
+
2017-04-05 11:49:19 +03:00
+ if (!line || line[0] == '\0' || !strcmp(line, "done\n")) {
+ break;
+ }
+ int len = strlen(line);
+ if (line[len - 1] == '\n') {
+ line[len - 1] = '\0';
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ len = len - 1;
+ if (len == 0) {
2017-04-05 11:49:19 +03:00
+ break;
+ }
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (strncmp(line, "skip", 4) == 0) {
+ if (len < 6 || line[4] != '=') {
+ g_error("read map failed - option 'skip' has no value ('%s')",
+ inbuf);
+ } else {
+ devname = line + 5;
+ skip = true;
+ }
+ } else {
+ while (1) {
+ if (!try_parse_option(&line, "format", &format, inbuf) &&
+ !try_parse_option(&line, "throttling.bps", &bps, inbuf) &&
+ !try_parse_option(&line, "throttling.group", &group, inbuf) &&
+ !try_parse_option(&line, "cache", &cache, inbuf))
+ {
+ break;
+ }
2017-08-07 10:10:07 +03:00
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (bps) {
+ bps_value = verify_u64(bps);
+ g_free(bps);
+ }
2017-08-07 10:10:07 +03:00
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (line[0] == '0' && line[1] == ':') {
+ path = line + 2;
+ write_zero = false;
+ } else if (line[0] == '1' && line[1] == ':') {
+ path = line + 2;
+ write_zero = true;
+ } else {
+ g_error("read map failed - parse error ('%s')", inbuf);
+ }
+
+ path = extract_devname(path, &devname, -1);
2017-04-05 11:49:19 +03:00
+ }
+
+ if (!devname) {
+ g_error("read map failed - no dev name specified ('%s')",
+ inbuf);
+ }
+
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+ RestoreMap *restore_map = g_new0(RestoreMap, 1);
+ restore_map->devname = g_strdup(devname);
+ restore_map->path = g_strdup(path);
+ restore_map->format = format;
+ restore_map->throttling_bps = bps_value;
+ restore_map->throttling_group = group;
+ restore_map->cache = cache;
+ restore_map->write_zero = write_zero;
+ restore_map->skip = skip;
2017-04-05 11:49:19 +03:00
+
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+ g_hash_table_insert(devmap, restore_map->devname, restore_map);
2017-04-05 11:49:19 +03:00
+
+ };
+ }
+
+ int i;
+ int vmstate_fd = -1;
+ bool drive_rename_bitmap[255];
+ memset(drive_rename_bitmap, 0, sizeof(drive_rename_bitmap));
2017-04-05 11:49:19 +03:00
+
+ for (i = 1; i < 255; i++) {
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, i);
+ if (di && (strcmp(di->devname, "vmstate") == 0)) {
+ char *statefn = g_strdup_printf("%s/vmstate.bin", dirname);
+ vmstate_fd = open(statefn, O_WRONLY|O_CREAT|O_EXCL, 0644);
+ if (vmstate_fd < 0) {
+ g_error("create vmstate file '%s' failed - %s", statefn,
+ g_strerror(errno));
+ }
+ g_free(statefn);
+ } else if (di) {
+ char *devfn = NULL;
2017-08-07 10:10:07 +03:00
+ const char *format = NULL;
+ uint64_t throttling_bps = 0;
+ const char *throttling_group = NULL;
+ const char *cache = NULL;
+ int flags = BDRV_O_RDWR;
2017-04-05 11:49:19 +03:00
+ bool write_zero = true;
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ bool skip = false;
2017-04-05 11:49:19 +03:00
+
+ BlockBackend *blk = NULL;
+
+ if (drive_list) {
+ skip = true;
+ int j;
+ for (j = 0; drive_list[j]; j++) {
+ if (strcmp(drive_list[j], di->devname) == 0) {
+ skip = false;
+ drive_rename_bitmap[i] = true;
+ break;
+ }
+ }
+ } else {
+ drive_rename_bitmap[i] = true;
+ }
+
+ if (!skip && readmap) {
2017-04-05 11:49:19 +03:00
+ RestoreMap *map;
+ map = (RestoreMap *)g_hash_table_lookup(devmap, di->devname);
+ if (map == NULL) {
+ g_error("no device name mapping for %s", di->devname);
+ }
+ devfn = map->path;
2017-08-07 10:10:07 +03:00
+ format = map->format;
+ throttling_bps = map->throttling_bps;
+ throttling_group = map->throttling_group;
+ cache = map->cache;
2017-04-05 11:49:19 +03:00
+ write_zero = map->write_zero;
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ skip = map->skip;
+ } else if (!skip) {
2017-04-05 11:49:19 +03:00
+ devfn = g_strdup_printf("%s/tmp-disk-%s.raw",
+ dirname, di->devname);
+ printf("DEVINFO %s %zd\n", devfn, di->size);
+
+ bdrv_img_create(devfn, "raw", NULL, NULL, NULL, di->size,
+ flags, true, &errp);
2017-04-05 11:49:19 +03:00
+ if (errp) {
+ g_error("can't create file %s: %s", devfn,
+ error_get_pretty(errp));
+ }
+
+ /* Note: we created an empty file above, so there is no
+ * need to write zeroes (so we generate a sparse file)
+ */
+ write_zero = false;
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (!skip) {
+ size_t devlen = strlen(devfn);
+ QDict *options = NULL;
+ bool writethrough;
+ if (format) {
+ /* explicit format from commandline */
+ options = qdict_new();
+ qdict_put_str(options, "driver", format);
+ } else if ((devlen > 4 && strcmp(devfn+devlen-4, ".raw") == 0) ||
+ strncmp(devfn, "/dev/", 5) == 0)
+ {
+ /* This part is now deprecated for PVE as well (just as qemu
+ * deprecated not specifying an explicit raw format, too.
+ */
+ /* explicit raw format */
+ options = qdict_new();
+ qdict_put_str(options, "driver", "raw");
+ }
2017-08-07 10:10:07 +03:00
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (cache && bdrv_parse_cache_mode(cache, &flags, &writethrough)) {
+ g_error("invalid cache option: %s\n", cache);
+ }
2017-08-07 10:10:07 +03:00
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (errp || !(blk = blk_new_open(devfn, NULL, options, flags, &errp))) {
+ g_error("can't open file %s - %s", devfn,
+ error_get_pretty(errp));
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (cache) {
+ blk_set_enable_write_cache(blk, !writethrough);
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (throttling_group) {
+ blk_io_limits_enable(blk, throttling_group);
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (throttling_bps) {
+ if (!throttling_group) {
+ blk_io_limits_enable(blk, devfn);
+ }
+
+ ThrottleConfig cfg;
+ throttle_config_init(&cfg);
+ cfg.buckets[THROTTLE_BPS_WRITE].avg = throttling_bps;
+ Error *err = NULL;
+ if (!throttle_is_valid(&cfg, &err)) {
+ error_report_err(err);
+ g_error("failed to apply throttling");
+ }
+ blk_set_io_limits(blk, &cfg);
+ }
+ }
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (vma_reader_register_bs(vmar, i, blk, write_zero, skip, &errp) < 0) {
2017-04-05 11:49:19 +03:00
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ if (!readmap) {
+ g_free(devfn);
+ }
+ }
+ }
+
+ if (drive_list) {
+ g_strfreev(drive_list);
+ }
+
2017-04-05 11:49:19 +03:00
+ if (vma_reader_restore(vmar, vmstate_fd, verbose, &errp) < 0) {
+ g_error("restore failed - %s", error_get_pretty(errp));
+ }
+
+ if (!readmap) {
+ for (i = 1; i < 255; i++) {
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, i);
+ if (di && drive_rename_bitmap[i]) {
2017-04-05 11:49:19 +03:00
+ char *tmpfn = g_strdup_printf("%s/tmp-disk-%s.raw",
+ dirname, di->devname);
+ char *fn = g_strdup_printf("%s/disk-%s.raw",
+ dirname, di->devname);
+ if (rename(tmpfn, fn) != 0) {
+ g_error("rename %s to %s failed - %s",
+ tmpfn, fn, g_strerror(errno));
+ }
+ }
+ }
+ }
+
+ vma_reader_destroy(vmar);
+
2017-08-07 10:10:07 +03:00
+ bdrv_close_all();
+
+ return ret;
+}
+
+static int verify_content(int argc, char **argv)
+{
+ int c, ret = 0;
+ int verbose = 0;
+ const char *filename;
+
+ for (;;) {
+ c = getopt(argc, argv, "hv");
+ if (c == -1) {
+ break;
+ }
+ switch (c) {
+ case '?':
+ case 'h':
+ help();
+ break;
+ case 'v':
+ verbose = 1;
+ break;
+ default:
+ help();
+ }
+ }
+
+ /* Get the filename */
+ if ((optind + 1) != argc) {
+ help();
+ }
+ filename = argv[optind++];
+
+ Error *errp = NULL;
+ VmaReader *vmar = vma_reader_create(filename, &errp);
+
+ if (!vmar) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ if (verbose) {
+ print_content(vmar);
+ }
+
+ if (vma_reader_verify(vmar, verbose, &errp) < 0) {
+ g_error("verify failed - %s", error_get_pretty(errp));
+ }
+
+ vma_reader_destroy(vmar);
+
2017-04-05 11:49:19 +03:00
+ bdrv_close_all();
+
+ return ret;
+}
+
+typedef struct BackupJob {
2017-08-07 10:10:07 +03:00
+ BlockBackend *target;
2017-04-05 11:49:19 +03:00
+ int64_t len;
+ VmaWriter *vmaw;
+ uint8_t dev_id;
+} BackupJob;
+
+#define BACKUP_SECTORS_PER_CLUSTER (VMA_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
2017-08-07 10:10:07 +03:00
+static void coroutine_fn backup_run_empty(void *opaque)
+{
+ VmaWriter *vmaw = (VmaWriter *)opaque;
+
+ vma_writer_flush_output(vmaw);
+
+ Error *err = NULL;
+ if (vma_writer_close(vmaw, &err) != 0) {
+ g_warning("vma_writer_close failed %s", error_get_pretty(err));
+ }
+}
+
2017-04-05 11:49:19 +03:00
+static void coroutine_fn backup_run(void *opaque)
+{
+ BackupJob *job = (BackupJob *)opaque;
+ struct iovec iov;
+ QEMUIOVector qiov;
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ int64_t start, end, readlen;
2017-04-05 11:49:19 +03:00
+ int ret = 0;
+
2017-08-07 10:10:07 +03:00
+ unsigned char *buf = blk_blockalign(job->target, VMA_CLUSTER_SIZE);
2017-04-05 11:49:19 +03:00
+
+ start = 0;
+ end = DIV_ROUND_UP(job->len / BDRV_SECTOR_SIZE,
+ BACKUP_SECTORS_PER_CLUSTER);
+
+ for (; start < end; start++) {
+ iov.iov_base = buf;
+ iov.iov_len = VMA_CLUSTER_SIZE;
+ qemu_iovec_init_external(&qiov, &iov, 1);
+
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ if (start + 1 == end) {
+ memset(buf, 0, VMA_CLUSTER_SIZE);
+ readlen = job->len - start * VMA_CLUSTER_SIZE;
+ assert(readlen > 0 && readlen <= VMA_CLUSTER_SIZE);
+ } else {
+ readlen = VMA_CLUSTER_SIZE;
+ }
+
2017-08-07 10:10:07 +03:00
+ ret = blk_co_preadv(job->target, start * VMA_CLUSTER_SIZE,
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ readlen, &qiov, 0);
2017-04-05 11:49:19 +03:00
+ if (ret < 0) {
+ vma_writer_set_error(job->vmaw, "read error");
2017-04-05 11:49:19 +03:00
+ goto out;
+ }
+
+ size_t zb = 0;
+ if (vma_writer_write(job->vmaw, job->dev_id, start, buf, &zb) < 0) {
+ vma_writer_set_error(job->vmaw, "backup_dump_cb vma_writer_write failed");
2017-04-05 11:49:19 +03:00
+ goto out;
+ }
+ }
+
+
+out:
+ if (vma_writer_close_stream(job->vmaw, job->dev_id) <= 0) {
+ Error *err = NULL;
+ if (vma_writer_close(job->vmaw, &err) != 0) {
+ g_warning("vma_writer_close failed %s", error_get_pretty(err));
+ }
+ }
+ qemu_vfree(buf);
2017-04-05 11:49:19 +03:00
+}
+
+static int create_archive(int argc, char **argv)
+{
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+ int c;
2017-04-05 11:49:19 +03:00
+ int verbose = 0;
+ const char *archivename;
+ GList *backup_coroutines = NULL;
2017-04-05 11:49:19 +03:00
+ GList *config_files = NULL;
+
+ for (;;) {
+ c = getopt(argc, argv, "hvc:");
+ if (c == -1) {
+ break;
+ }
+ switch (c) {
+ case '?':
+ case 'h':
+ help();
+ break;
+ case 'c':
+ config_files = g_list_append(config_files, optarg);
+ break;
+ case 'v':
+ verbose = 1;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+
+
2017-08-07 10:10:07 +03:00
+ /* make sure we an archive name */
+ if ((optind + 1) > argc) {
2017-04-05 11:49:19 +03:00
+ help();
+ }
+
+ archivename = argv[optind++];
+
+ uuid_t uuid;
+ uuid_generate(uuid);
+
+ Error *local_err = NULL;
+ VmaWriter *vmaw = vma_writer_create(archivename, uuid, &local_err);
+
+ if (vmaw == NULL) {
+ g_error("%s", error_get_pretty(local_err));
+ }
+
+ GList *l = config_files;
+ while (l && l->data) {
+ char *name = l->data;
+ char *cdata = NULL;
+ gsize clen = 0;
+ GError *err = NULL;
+ if (!g_file_get_contents(name, &cdata, &clen, &err)) {
+ unlink(archivename);
+ g_error("Unable to read file: %s", err->message);
+ }
+
+ if (vma_writer_add_config(vmaw, name, cdata, clen) != 0) {
+ unlink(archivename);
+ g_error("Unable to append config data %s (len = %zd)",
+ name, clen);
+ }
+ l = g_list_next(l);
+ }
+
2017-08-07 10:10:07 +03:00
+ int devcount = 0;
2017-04-05 11:49:19 +03:00
+ while (optind < argc) {
+ const char *path = argv[optind++];
+ char *devname = NULL;
2017-08-07 10:10:07 +03:00
+ path = extract_devname(path, &devname, devcount++);
2017-04-05 11:49:19 +03:00
+
+ Error *errp = NULL;
2017-08-07 10:10:07 +03:00
+ BlockBackend *target;
2017-04-05 11:49:19 +03:00
+
2017-08-07 10:10:07 +03:00
+ target = blk_new_open(path, NULL, NULL, 0, &errp);
+ if (!target) {
2017-04-05 11:49:19 +03:00
+ unlink(archivename);
+ g_error("bdrv_open '%s' failed - %s", path, error_get_pretty(errp));
+ }
2017-08-07 10:10:07 +03:00
+ int64_t size = blk_getlength(target);
2017-04-05 11:49:19 +03:00
+ int dev_id = vma_writer_register_stream(vmaw, devname, size);
+ if (dev_id <= 0) {
+ unlink(archivename);
+ g_error("vma_writer_register_stream '%s' failed", devname);
2017-04-05 11:49:19 +03:00
+ }
+
+ BackupJob *job = g_new0(BackupJob, 1);
+ job->len = size;
2017-08-07 10:10:07 +03:00
+ job->target = target;
2017-04-05 11:49:19 +03:00
+ job->vmaw = vmaw;
+ job->dev_id = dev_id;
+
+ Coroutine *co = qemu_coroutine_create(backup_run, job);
+ // Don't enter coroutine yet, because it might write the header before
+ // all streams can be registered.
+ backup_coroutines = g_list_append(backup_coroutines, co);
2017-04-05 11:49:19 +03:00
+ }
+
+ VmaStatus vmastat;
2017-04-05 11:49:19 +03:00
+ int percent = 0;
+ int last_percent = -1;
+
2017-08-07 10:10:07 +03:00
+ if (devcount) {
+ GList *entry = backup_coroutines;
+ while (entry && entry->data) {
+ Coroutine *co = entry->data;
+ qemu_coroutine_enter(co);
+ entry = g_list_next(entry);
+ }
+
2017-08-07 10:10:07 +03:00
+ while (1) {
+ main_loop_wait(false);
+ vma_writer_get_status(vmaw, &vmastat);
2017-04-05 11:49:19 +03:00
+
2017-08-07 10:10:07 +03:00
+ if (verbose) {
2017-04-05 11:49:19 +03:00
+
2017-08-07 10:10:07 +03:00
+ uint64_t total = 0;
+ uint64_t transferred = 0;
+ uint64_t zero_bytes = 0;
2017-04-05 11:49:19 +03:00
+
2017-08-07 10:10:07 +03:00
+ int i;
+ for (i = 0; i < 256; i++) {
+ if (vmastat.stream_info[i].size) {
+ total += vmastat.stream_info[i].size;
+ transferred += vmastat.stream_info[i].transferred;
+ zero_bytes += vmastat.stream_info[i].zero_bytes;
+ }
2017-04-05 11:49:19 +03:00
+ }
2017-08-07 10:10:07 +03:00
+ percent = (transferred*100)/total;
+ if (percent != last_percent) {
+ fprintf(stderr, "progress %d%% %zd/%zd %zd\n", percent,
+ transferred, total, zero_bytes);
+ fflush(stderr);
2017-04-05 11:49:19 +03:00
+
2017-08-07 10:10:07 +03:00
+ last_percent = percent;
+ }
2017-04-05 11:49:19 +03:00
+ }
+
2017-08-07 10:10:07 +03:00
+ if (vmastat.closed) {
+ break;
+ }
2017-04-05 11:49:19 +03:00
+ }
+ } else {
+ Coroutine *co = qemu_coroutine_create(backup_run_empty, vmaw);
+ qemu_coroutine_enter(co);
+ while (1) {
+ main_loop_wait(false);
+ vma_writer_get_status(vmaw, &vmastat);
+ if (vmastat.closed) {
+ break;
+ }
+ }
+ }
+
+ bdrv_drain_all();
+
+ vma_writer_get_status(vmaw, &vmastat);
+
+ if (verbose) {
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+ int i;
2017-04-05 11:49:19 +03:00
+ for (i = 0; i < 256; i++) {
+ VmaStreamInfo *si = &vmastat.stream_info[i];
+ if (si->size) {
+ fprintf(stderr, "image %s: size=%zd zeros=%zd saved=%zd\n",
+ si->devname, si->size, si->zero_bytes,
+ si->size - si->zero_bytes);
+ }
+ }
+ }
+
+ if (vmastat.status < 0) {
+ unlink(archivename);
+ g_error("creating vma archive failed");
+ }
+
+ g_list_free(backup_coroutines);
+ g_list_free(config_files);
+ vma_writer_destroy(vmaw);
2017-04-05 11:49:19 +03:00
+ return 0;
+}
+
2017-08-07 10:10:07 +03:00
+static int dump_config(int argc, char **argv)
+{
+ int c, ret = 0;
+ const char *filename;
+ const char *config_name = "qemu-server.conf";
+
+ for (;;) {
+ c = getopt(argc, argv, "hc:");
+ if (c == -1) {
+ break;
+ }
+ switch (c) {
+ case '?':
+ case 'h':
+ help();
+ break;
+ case 'c':
+ config_name = optarg;
+ break;
+ default:
+ help();
+ }
+ }
+
+ /* Get the filename */
+ if ((optind + 1) != argc) {
+ help();
+ }
+ filename = argv[optind++];
+
+ Error *errp = NULL;
+ VmaReader *vmar = vma_reader_create(filename, &errp);
+
+ if (!vmar) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ int found = 0;
+ GList *l = vma_reader_get_config_data(vmar);
+ while (l && l->data) {
+ VmaConfigData *cdata = (VmaConfigData *)l->data;
+ l = g_list_next(l);
+ if (strcmp(cdata->name, config_name) == 0) {
+ found = 1;
+ fwrite(cdata->data, cdata->len, 1, stdout);
+ break;
+ }
+ }
+
+ vma_reader_destroy(vmar);
+
+ bdrv_close_all();
+
+ if (!found) {
+ fprintf(stderr, "unable to find configuration data '%s'\n", config_name);
+ return -1;
+ }
+
+ return ret;
+}
+
2017-04-05 11:49:19 +03:00
+int main(int argc, char **argv)
+{
+ const char *cmdname;
+ Error *main_loop_err = NULL;
+
+ error_init(argv[0]);
+ module_call_init(MODULE_INIT_TRACE);
+ qemu_init_exec_dir(argv[0]);
2017-04-05 11:49:19 +03:00
+
+ if (qemu_init_main_loop(&main_loop_err)) {
+ g_error("%s", error_get_pretty(main_loop_err));
+ }
+
+ bdrv_init();
+ module_call_init(MODULE_INIT_QOM);
2017-04-05 11:49:19 +03:00
+
+ if (argc < 2) {
+ help();
+ }
+
+ cmdname = argv[1];
+ argc--; argv++;
+
+
+ if (!strcmp(cmdname, "list")) {
+ return list_content(argc, argv);
+ } else if (!strcmp(cmdname, "create")) {
+ return create_archive(argc, argv);
+ } else if (!strcmp(cmdname, "extract")) {
+ return extract_content(argc, argv);
2017-08-07 10:10:07 +03:00
+ } else if (!strcmp(cmdname, "verify")) {
+ return verify_content(argc, argv);
+ } else if (!strcmp(cmdname, "config")) {
+ return dump_config(argc, argv);
2017-04-05 11:49:19 +03:00
+ }
+
+ help();
+ return 0;
+}
diff --git a/vma.h b/vma.h
new file mode 100644
index 0000000000..86d2873aa5
2017-04-05 11:49:19 +03:00
--- /dev/null
+++ b/vma.h
@@ -0,0 +1,150 @@
2017-04-05 11:49:19 +03:00
+/*
+ * VMA: Virtual Machine Archive
+ *
+ * Copyright (C) Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef BACKUP_VMA_H
+#define BACKUP_VMA_H
+
+#include <uuid/uuid.h>
+#include "qapi/error.h"
+#include "block/block.h"
+
+#define VMA_BLOCK_BITS 12
+#define VMA_BLOCK_SIZE (1<<VMA_BLOCK_BITS)
+#define VMA_CLUSTER_BITS (VMA_BLOCK_BITS+4)
+#define VMA_CLUSTER_SIZE (1<<VMA_CLUSTER_BITS)
+
+#if VMA_CLUSTER_SIZE != 65536
+#error unexpected cluster size
+#endif
+
+#define VMA_EXTENT_HEADER_SIZE 512
+#define VMA_BLOCKS_PER_EXTENT 59
+#define VMA_MAX_CONFIGS 256
+
+#define VMA_MAX_EXTENT_SIZE \
+ (VMA_EXTENT_HEADER_SIZE+VMA_CLUSTER_SIZE*VMA_BLOCKS_PER_EXTENT)
+#if VMA_MAX_EXTENT_SIZE != 3867136
+#error unexpected VMA_EXTENT_SIZE
+#endif
+
+/* File Format Definitions */
+
+#define VMA_MAGIC (GUINT32_TO_BE(('V'<<24)|('M'<<16)|('A'<<8)|0x00))
+#define VMA_EXTENT_MAGIC (GUINT32_TO_BE(('V'<<24)|('M'<<16)|('A'<<8)|'E'))
+
+typedef struct VmaDeviceInfoHeader {
+ uint32_t devname_ptr; /* offset into blob_buffer table */
+ uint32_t reserved0;
+ uint64_t size; /* device size in bytes */
+ uint64_t reserved1;
+ uint64_t reserved2;
+} VmaDeviceInfoHeader;
+
+typedef struct VmaHeader {
+ uint32_t magic;
+ uint32_t version;
+ unsigned char uuid[16];
+ int64_t ctime;
+ unsigned char md5sum[16];
+
+ uint32_t blob_buffer_offset;
+ uint32_t blob_buffer_size;
+ uint32_t header_size;
+
+ unsigned char reserved[1984];
+
+ uint32_t config_names[VMA_MAX_CONFIGS]; /* offset into blob_buffer table */
+ uint32_t config_data[VMA_MAX_CONFIGS]; /* offset into blob_buffer table */
+
+ uint32_t reserved1;
+
+ VmaDeviceInfoHeader dev_info[256];
+} VmaHeader;
+
+typedef struct VmaExtentHeader {
+ uint32_t magic;
+ uint16_t reserved1;
+ uint16_t block_count;
+ unsigned char uuid[16];
+ unsigned char md5sum[16];
+ uint64_t blockinfo[VMA_BLOCKS_PER_EXTENT];
+} VmaExtentHeader;
+
+/* functions/definitions to read/write vma files */
+
+typedef struct VmaReader VmaReader;
+
+typedef struct VmaWriter VmaWriter;
+
+typedef struct VmaConfigData {
+ const char *name;
+ const void *data;
+ uint32_t len;
+} VmaConfigData;
+
+typedef struct VmaStreamInfo {
+ uint64_t size;
+ uint64_t cluster_count;
+ uint64_t transferred;
+ uint64_t zero_bytes;
+ int finished;
+ char *devname;
+} VmaStreamInfo;
+
+typedef struct VmaStatus {
+ int status;
+ bool closed;
+ char errmsg[8192];
+ char uuid_str[37];
+ VmaStreamInfo stream_info[256];
+} VmaStatus;
+
+typedef struct VmaDeviceInfo {
+ uint64_t size; /* device size in bytes */
+ const char *devname;
+} VmaDeviceInfo;
+
+VmaWriter *vma_writer_create(const char *filename, uuid_t uuid, Error **errp);
+int vma_writer_close(VmaWriter *vmaw, Error **errp);
2017-08-07 10:10:07 +03:00
+void vma_writer_error_propagate(VmaWriter *vmaw, Error **errp);
2017-04-05 11:49:19 +03:00
+void vma_writer_destroy(VmaWriter *vmaw);
+int vma_writer_add_config(VmaWriter *vmaw, const char *name, gpointer data,
+ size_t len);
+int vma_writer_register_stream(VmaWriter *vmaw, const char *devname,
+ size_t size);
+
+int64_t coroutine_fn vma_writer_write(VmaWriter *vmaw, uint8_t dev_id,
+ int64_t cluster_num,
+ const unsigned char *buf,
2017-04-05 11:49:19 +03:00
+ size_t *zero_bytes);
+
+int coroutine_fn vma_writer_close_stream(VmaWriter *vmaw, uint8_t dev_id);
2017-08-07 10:10:07 +03:00
+int coroutine_fn vma_writer_flush_output(VmaWriter *vmaw);
2017-04-05 11:49:19 +03:00
+
+int vma_writer_get_status(VmaWriter *vmaw, VmaStatus *status);
+void vma_writer_set_error(VmaWriter *vmaw, const char *fmt, ...) G_GNUC_PRINTF(2, 3);
2017-04-05 11:49:19 +03:00
+
+
+VmaReader *vma_reader_create(const char *filename, Error **errp);
+void vma_reader_destroy(VmaReader *vmar);
+VmaHeader *vma_reader_get_header(VmaReader *vmar);
+GList *vma_reader_get_config_data(VmaReader *vmar);
+VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id);
+int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id,
2017-08-07 10:10:07 +03:00
+ BlockBackend *target, bool write_zeroes,
squash related patches where there is no good reason to keep them separate. It's a pain during rebase if there are multiple patches changing the same code over and over again. This was especially bad for the backup-related patches. If the history of patches really is needed, it can be extracted via git. Additionally, compilation with partial application of patches was broken since a long time, because one of the master key changes became part of an earlier patch during a past rebase. If only the same files were changed by a subsequent patch and the changes felt to belong together (obvious for later bug fixes, but also done for features e.g. adding master key support for PBS), the patches were squashed together. The PBS namespace support patch was split into the individual parts it changes, i.e. PBS block driver, pbs-restore binary and QMP backup infrastructure, and squashed into the respective patches. No code change is intended, git diff in the submodule should not show any difference between applying all patches before this commit and applying all patches after this commit. The query-proxmox-support QMP function has been left as part of the "PVE-Backup: Proxmox backup patches for QEMU" patch, because it's currently only used there. If it ever is used elsewhere too, it can be split out from there. The recent alloc-track and BQL-related savevm-async changes have been left separate for now, because it's not 100% clear they are the best approach yet. This depends on what upstream decides about the BQL stuff and whether and what kind of issues with the changes pop up. The qemu-img dd snapshot patch has been re-ordered to after the other qemu-img dd patches. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
+ bool skip, Error **errp);
2017-04-05 11:49:19 +03:00
+int vma_reader_restore(VmaReader *vmar, int vmstate_fd, bool verbose,
+ Error **errp);
2017-08-07 10:10:07 +03:00
+int vma_reader_verify(VmaReader *vmar, bool verbose, Error **errp);
2017-04-05 11:49:19 +03:00
+
+#endif /* BACKUP_VMA_H */