Compare commits

...

14 Commits

Author SHA1 Message Date
b2520b3757 Update driver: add qemu_file_mirror_path 2025-08-26 03:08:13 +03:00
85cdd3a24f Add vitastor support 2025-07-20 17:15:17 +03:00
Thomas Lamprecht
245689b9ae bump version to 9.2.0-7
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-17 22:54:36 +02:00
Thomas Lamprecht
a2ee4f403e savevm-async: reuse migration blocker check for snapshots/hibernation
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-07-17 22:54:36 +02:00
Fiona Ebner
8fcd89fba8 bump version to 9.2.0-6
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-07-02 14:17:24 +02:00
Fiona Ebner
ec2acae08d gluster: provide more context for deprecation warning
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-07-02 14:17:24 +02:00
Fiona Ebner
50717bf6df d/rules: remove outdated workaround against historic changelog file
There is no top-level 'Changelog' file in the QEMU submodule
repository anymore since QEMU v5.2, to be precise commit e83029fa60
("CHANGELOG: remove disused file").

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-04-04 18:46:52 +02:00
Thomas Lamprecht
e0969989ac bump version to 9.2.0-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-04 16:16:02 +02:00
Thomas Lamprecht
b30e898392 PVE backup: backup-access api: simplify bitmap logic
See inner patch for details.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-04 16:14:24 +02:00
Thomas Lamprecht
053f68a7ac bump version to 9.2.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-03 17:00:39 +02:00
Thomas Lamprecht
cb662eb2c1 pve backup: implement features for external backup providers
This series of patches extends our existing QEMU backup integration
for PVE such that it can be used for a in-development external-backup
plugin API.

This includes among other things an API for backup access and teardown
as well as dirty-bitmap management. See the separate patches for more
details and check [v7] and [v8] of this series on the list. The former
is the last revision by Fiona which was already very advanced,
Wolfgang picked that up and extended/adapted it slightly to match
needs that he found during implementation of a test provider and from
feedback of (future) external backup provider like Bareos and Veritas.
This was mainly done to get the QEMU build out for broad QA, we can
still change things here as long as the rest of the series is not yet
applied.

[v7]: https://lore.proxmox.com/pve-devel/20250401173435.221892-1-f.ebner@proxmox.com/
[v8]: https://lore.proxmox.com/pve-devel/20250403123118.264974-1-w.bumiller@proxmox.com/

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-03 16:48:42 +02:00
Wolfgang Bumiller
a6ee093f1c merge async snapshot improvements
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2025-04-02 17:23:05 +02:00
Thomas Lamprecht
76dd2fb90b bump version to 9.2.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-03-26 09:57:04 +01:00
Fiona Ebner
3da02409af revert hpet changes to fix performance regression
As reported in the community forum [0][1], QEMU processes for Linux
guests would consume more CPU on the host after an update to QEMU 9.2.
The issue was reproduced and bisecting pointed to QEMU commit
f0ccf77078 ("hpet: fix and cleanup persistence of interrupt status").

Some quick experimentation suggests that in particular the last part
 is responsible for the issue:
> - the timer must be kept running even if not enabled, in
>   order to set the ISR flag, so writes to HPET_TN_CFG must
>   not call hpet_del_timer()

Users confirmed that setting the hpet=off machine flag works around
the issue[0]. For Windows (7 or later) guests, the flag is already
disabled, because of issues in the past [2].

Upstream suggested reverting the relevant patches for now [3], because
other issues were reported too. All except commit 5895879aca ("hpet:
remove unnecessary variable "index"") are actually dependent on each
other for cleanly reverting f0ccf77078, and while not strictly
required, that one was reverted too for completeness.

[0]: https://forum.proxmox.com/threads/163694/
[1]: https://forum.proxmox.com/threads/161849/post-756793
[2]: https://lists.proxmox.com/pipermail/pve-devel/2012-December/004958.html
[3]: https://lore.kernel.org/qemu-devel/CABgObfaKJ5NFVKmYLFmu4C0iZZLJJtcWksLCzyA0tBoz0koZ4A@mail.gmail.com/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-03-26 08:42:59 +01:00
31 changed files with 4731 additions and 3 deletions

View File

@@ -58,7 +58,7 @@ $(BUILDDIR): submodule
deb kvm: $(DEBS)
$(DEB_DBG): $(DEB)
$(DEB): $(BUILDDIR)
cd $(BUILDDIR); dpkg-buildpackage -b -us -uc
cd $(BUILDDIR); dpkg-buildpackage -b -us -uc -j32
lintian $(DEBS)
sbuild: $(DSC)

47
debian/changelog vendored
View File

@@ -1,3 +1,50 @@
pve-qemu-kvm (9.2.0-7+vitastor2) bookworm; urgency=medium
* Add Vitastor support
-- Vitaliy Filippov <vitalif@yourcmc.ru> Sun, 20 Jul 2025 16:39:14 +0300
pve-qemu-kvm (9.2.0-7) bookworm; urgency=medium
* savevm-async: reuse migration blocker check for snapshots/hibernation to
avoid crashing a VM when on these actions if its configuration does not
support them.
-- Proxmox Support Team <support@proxmox.com> Thu, 17 Jul 2025 22:34:53 +0200
pve-qemu-kvm (9.2.0-6) bookworm; urgency=medium
* gluster: provide more context for deprecation warning.
-- Proxmox Support Team <support@proxmox.com> Wed, 02 Jul 2025 12:30:27 +0200
pve-qemu-kvm (9.2.0-5) bookworm; urgency=medium
* pve backup: backup-access api: simplify bitmap logic
-- Proxmox Support Team <support@proxmox.com> Fri, 04 Apr 2025 16:15:58 +0200
pve-qemu-kvm (9.2.0-4) bookworm; urgency=medium
* various async snapshot improvements, inclduing using a dedicated IO thread
for the state file when doing a live snapshot. That should reduce load on
the main thread and for it to get stuck on IO, i.e. same benefits as using
a dedicated IO thread for regular drives. This is particularly interesting
when the VM state storage is a network storage like NFS. It should also
address #6262.
* pve backup: implement basic features and API in preperation for external
backup provider storage plugins.
-- Proxmox Support Team <support@proxmox.com> Thu, 03 Apr 2025 17:00:34 +0200
pve-qemu-kvm (9.2.0-3) bookworm; urgency=medium
* revert changes to the High Precision Event Timer (HPET) to fix performance
regression
-- Proxmox Support Team <support@proxmox.com> Wed, 26 Mar 2025 09:56:01 +0100
pve-qemu-kvm (9.2.0-2) bookworm; urgency=medium
* fix assertion failure when migrating a VM with multiple disks on a

1
debian/control vendored
View File

@@ -59,6 +59,7 @@ Depends: ceph-common (>= 0.48),
libspice-server1 (>= 0.14.0~),
libusb-1.0-0 (>= 1.0.17-1),
libusbredirparser1 (>= 0.6-2),
vitastor-client (>= 0.9.4),
libuuid1,
${misc:Depends},
${shlibs:Depends},

1410
debian/patches/pve-qemu-9.2-vitastor.patch vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,50 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 19 Mar 2025 17:31:05 +0100
Subject: [PATCH] Revert "hpet: avoid timer storms on periodic timers"
This reverts commit 7c912ffb59e8137091894d767433e65c3df8b0bf.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/timer/hpet.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 5399f1b2a3..8ccc421cbb 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -59,7 +59,6 @@ typedef struct HPETTimer { /* timers */
uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit
* mode. Next pop will be actual timer expiration.
*/
- uint64_t last; /* last value armed, to avoid timer storms */
} HPETTimer;
struct HPETState {
@@ -267,7 +266,6 @@ static int hpet_post_load(void *opaque, int version_id)
for (i = 0; i < s->num_timers; i++) {
HPETTimer *t = &s->timer[i];
t->cmp64 = hpet_calculate_cmp64(t, s->hpet_counter, t->cmp);
- t->last = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - NANOSECONDS_PER_SECOND;
}
/* Recalculate the offset between the main counter and guest time */
if (!s->hpet_offset_saved) {
@@ -366,15 +364,8 @@ static const VMStateDescription vmstate_hpet = {
static void hpet_arm(HPETTimer *t, uint64_t tick)
{
- uint64_t ns = hpet_get_ns(t->state, tick);
-
- /* Clamp period to reasonable min value (1 us) */
- if (timer_is_periodic(t) && ns - t->last < 1000) {
- ns = t->last + 1000;
- }
-
- t->last = ns;
- timer_mod(t->qemu_timer, ns);
+ /* FIXME: Clamp period to reasonable min value? */
+ timer_mod(t->qemu_timer, hpet_get_ns(t->state, tick));
}
/*

View File

@@ -0,0 +1,203 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 19 Mar 2025 17:31:08 +0100
Subject: [PATCH] Revert "hpet: store full 64-bit target value of the counter"
This reverts commit 242d665396407f83a6acbffc804882eeb21cfdad.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/timer/hpet.c | 111 +++++++++++++++++++++++++++---------------------
1 file changed, 62 insertions(+), 49 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 8ccc421cbb..415a9433f1 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -54,7 +54,6 @@ typedef struct HPETTimer { /* timers */
uint64_t cmp; /* comparator */
uint64_t fsb; /* FSB route */
/* Hidden register state */
- uint64_t cmp64; /* comparator (extended to counter width) */
uint64_t period; /* Last value written to comparator */
uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit
* mode. Next pop will be actual timer expiration.
@@ -116,6 +115,11 @@ static uint32_t timer_enabled(HPETTimer *t)
}
static uint32_t hpet_time_after(uint64_t a, uint64_t b)
+{
+ return ((int32_t)(b - a) < 0);
+}
+
+static uint32_t hpet_time_after64(uint64_t a, uint64_t b)
{
return ((int64_t)(b - a) < 0);
}
@@ -152,32 +156,27 @@ static uint64_t hpet_get_ticks(HPETState *s)
return ns_to_ticks(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + s->hpet_offset);
}
-static uint64_t hpet_get_ns(HPETState *s, uint64_t tick)
-{
- return ticks_to_ns(tick) - s->hpet_offset;
-}
-
/*
- * calculate next value of the general counter that matches the
- * target (either entirely, or the low 32-bit only depending on
- * the timer mode).
+ * calculate diff between comparator value and current ticks
*/
-static uint64_t hpet_calculate_cmp64(HPETTimer *t, uint64_t cur_tick, uint64_t target)
+static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current)
{
+
if (t->config & HPET_TN_32BIT) {
- uint64_t result = deposit64(cur_tick, 0, 32, target);
- if (result < cur_tick) {
- result += 0x100000000ULL;
- }
- return result;
+ uint32_t diff, cmp;
+
+ cmp = (uint32_t)t->cmp;
+ diff = cmp - (uint32_t)current;
+ diff = (int32_t)diff > 0 ? diff : (uint32_t)1;
+ return (uint64_t)diff;
} else {
- return target;
- }
-}
+ uint64_t diff, cmp;
-static uint64_t hpet_next_wrap(uint64_t cur_tick)
-{
- return (cur_tick | 0xffffffffU) + 1;
+ cmp = t->cmp;
+ diff = cmp - current;
+ diff = (int64_t)diff > 0 ? diff : (uint64_t)1;
+ return diff;
+ }
}
static void update_irq(struct HPETTimer *timer, int set)
@@ -261,12 +260,7 @@ static bool hpet_validate_num_timers(void *opaque, int version_id)
static int hpet_post_load(void *opaque, int version_id)
{
HPETState *s = opaque;
- int i;
- for (i = 0; i < s->num_timers; i++) {
- HPETTimer *t = &s->timer[i];
- t->cmp64 = hpet_calculate_cmp64(t, s->hpet_counter, t->cmp);
- }
/* Recalculate the offset between the main counter and guest time */
if (!s->hpet_offset_saved) {
s->hpet_offset = ticks_to_ns(s->hpet_counter)
@@ -362,10 +356,14 @@ static const VMStateDescription vmstate_hpet = {
}
};
-static void hpet_arm(HPETTimer *t, uint64_t tick)
+static void hpet_arm(HPETTimer *t, uint64_t ticks)
{
- /* FIXME: Clamp period to reasonable min value? */
- timer_mod(t->qemu_timer, hpet_get_ns(t->state, tick));
+ if (ticks < ns_to_ticks(INT64_MAX / 2)) {
+ timer_mod(t->qemu_timer,
+ qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + ticks_to_ns(ticks));
+ } else {
+ timer_del(t->qemu_timer);
+ }
}
/*
@@ -374,44 +372,54 @@ static void hpet_arm(HPETTimer *t, uint64_t tick)
static void hpet_timer(void *opaque)
{
HPETTimer *t = opaque;
+ uint64_t diff;
+
uint64_t period = t->period;
uint64_t cur_tick = hpet_get_ticks(t->state);
if (timer_is_periodic(t) && period != 0) {
- while (hpet_time_after(cur_tick, t->cmp64)) {
- t->cmp64 += period;
- }
if (t->config & HPET_TN_32BIT) {
- t->cmp = (uint32_t)t->cmp64;
+ while (hpet_time_after(cur_tick, t->cmp)) {
+ t->cmp = (uint32_t)(t->cmp + t->period);
+ }
} else {
- t->cmp = t->cmp64;
+ while (hpet_time_after64(cur_tick, t->cmp)) {
+ t->cmp += period;
+ }
+ }
+ diff = hpet_calculate_diff(t, cur_tick);
+ hpet_arm(t, diff);
+ } else if (t->config & HPET_TN_32BIT && !timer_is_periodic(t)) {
+ if (t->wrap_flag) {
+ diff = hpet_calculate_diff(t, cur_tick);
+ hpet_arm(t, diff);
+ t->wrap_flag = 0;
}
- hpet_arm(t, t->cmp64);
- } else if (t->wrap_flag) {
- t->wrap_flag = 0;
- hpet_arm(t, t->cmp64);
}
update_irq(t, 1);
}
static void hpet_set_timer(HPETTimer *t)
{
+ uint64_t diff;
+ uint32_t wrap_diff; /* how many ticks until we wrap? */
uint64_t cur_tick = hpet_get_ticks(t->state);
+ /* whenever new timer is being set up, make sure wrap_flag is 0 */
t->wrap_flag = 0;
- t->cmp64 = hpet_calculate_cmp64(t, cur_tick, t->cmp);
- if (t->config & HPET_TN_32BIT) {
-
- /* hpet spec says in one-shot 32-bit mode, generate an interrupt when
- * counter wraps in addition to an interrupt with comparator match.
- */
- if (!timer_is_periodic(t) && t->cmp64 > hpet_next_wrap(cur_tick)) {
+ diff = hpet_calculate_diff(t, cur_tick);
+
+ /* hpet spec says in one-shot 32-bit mode, generate an interrupt when
+ * counter wraps in addition to an interrupt with comparator match.
+ */
+ if (t->config & HPET_TN_32BIT && !timer_is_periodic(t)) {
+ wrap_diff = 0xffffffff - (uint32_t)cur_tick;
+ if (wrap_diff < (uint32_t)diff) {
+ diff = wrap_diff;
t->wrap_flag = 1;
- hpet_arm(t, hpet_next_wrap(cur_tick));
- return;
}
}
- hpet_arm(t, t->cmp64);
+ hpet_arm(t, diff);
}
static void hpet_del_timer(HPETTimer *t)
@@ -542,7 +550,12 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
timer->cmp = deposit64(timer->cmp, shift, len, value);
}
if (timer_is_periodic(timer)) {
- timer->period = deposit64(timer->period, shift, len, value);
+ /*
+ * FIXME: Clamp period to reasonable min value?
+ * Clamp period to reasonable max value
+ */
+ new_val = deposit64(timer->period, shift, len, value);
+ timer->period = MIN(new_val, (timer->config & HPET_TN_32BIT ? ~0u : ~0ull) >> 1);
}
timer->config &= ~HPET_TN_SETVAL;
if (hpet_enabled(s)) {

View File

@@ -0,0 +1,281 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 19 Mar 2025 17:31:09 +0100
Subject: [PATCH] Revert "hpet: accept 64-bit reads and writes"
This reverts commit c2366567378dd8fb89329816003801f54e30e6f3.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/timer/hpet.c | 137 +++++++++++++++++++++++++++++-------------
hw/timer/trace-events | 3 +-
2 files changed, 96 insertions(+), 44 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 415a9433f1..e1ac877759 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -437,7 +437,6 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
unsigned size)
{
HPETState *s = opaque;
- int shift = (addr & 4) * 8;
uint64_t cur_tick;
trace_hpet_ram_read(addr);
@@ -452,33 +451,52 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
return 0;
}
- switch (addr & 0x18) {
- case HPET_TN_CFG: // including interrupt capabilities
- return timer->config >> shift;
+ switch ((addr - 0x100) % 0x20) {
+ case HPET_TN_CFG:
+ return timer->config;
+ case HPET_TN_CFG + 4: // Interrupt capabilities
+ return timer->config >> 32;
case HPET_TN_CMP: // comparator register
- return timer->cmp >> shift;
+ return timer->cmp;
+ case HPET_TN_CMP + 4:
+ return timer->cmp >> 32;
case HPET_TN_ROUTE:
- return timer->fsb >> shift;
+ return timer->fsb;
+ case HPET_TN_ROUTE + 4:
+ return timer->fsb >> 32;
default:
trace_hpet_ram_read_invalid();
break;
}
} else {
- switch (addr & ~4) {
- case HPET_ID: // including HPET_PERIOD
- return s->capability >> shift;
+ switch (addr) {
+ case HPET_ID:
+ return s->capability;
+ case HPET_PERIOD:
+ return s->capability >> 32;
case HPET_CFG:
- return s->config >> shift;
+ return s->config;
+ case HPET_CFG + 4:
+ trace_hpet_invalid_hpet_cfg(4);
+ return 0;
case HPET_COUNTER:
if (hpet_enabled(s)) {
cur_tick = hpet_get_ticks(s);
} else {
cur_tick = s->hpet_counter;
}
- trace_hpet_ram_read_reading_counter(addr & 4, cur_tick);
- return cur_tick >> shift;
+ trace_hpet_ram_read_reading_counter(0, cur_tick);
+ return cur_tick;
+ case HPET_COUNTER + 4:
+ if (hpet_enabled(s)) {
+ cur_tick = hpet_get_ticks(s);
+ } else {
+ cur_tick = s->hpet_counter;
+ }
+ trace_hpet_ram_read_reading_counter(4, cur_tick);
+ return cur_tick >> 32;
case HPET_STATUS:
- return s->isr >> shift;
+ return s->isr;
default:
trace_hpet_ram_read_invalid();
break;
@@ -492,11 +510,11 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
{
int i;
HPETState *s = opaque;
- int shift = (addr & 4) * 8;
- int len = MIN(size * 8, 64 - shift);
uint64_t old_val, new_val, cleared;
trace_hpet_ram_write(addr, value);
+ old_val = hpet_ram_read(opaque, addr, 4);
+ new_val = value;
/*address range of all TN regs*/
if (addr >= 0x100 && addr <= 0x3ff) {
@@ -508,12 +526,9 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
trace_hpet_timer_id_out_of_range(timer_id);
return;
}
- switch (addr & 0x18) {
+ switch ((addr - 0x100) % 0x20) {
case HPET_TN_CFG:
- trace_hpet_ram_write_tn_cfg(addr & 4);
- old_val = timer->config;
- new_val = deposit64(old_val, shift, len, value);
- new_val = hpet_fixup_reg(new_val, old_val, HPET_TN_CFG_WRITE_MASK);
+ trace_hpet_ram_write_tn_cfg();
if (deactivating_bit(old_val, new_val, HPET_TN_TYPE_LEVEL)) {
/*
* Do this before changing timer->config; otherwise, if
@@ -521,7 +536,8 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
*/
update_irq(timer, 0);
}
- timer->config = new_val;
+ new_val = hpet_fixup_reg(new_val, old_val, HPET_TN_CFG_WRITE_MASK);
+ timer->config = (timer->config & 0xffffffff00000000ULL) | new_val;
if (activating_bit(old_val, new_val, HPET_TN_ENABLE)
&& (s->isr & (1 << timer_id))) {
update_irq(timer, 1);
@@ -534,28 +550,56 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
hpet_set_timer(timer);
}
break;
+ case HPET_TN_CFG + 4: // Interrupt capabilities
+ trace_hpet_ram_write_invalid_tn_cfg(4);
+ break;
case HPET_TN_CMP: // comparator register
+ trace_hpet_ram_write_tn_cmp(0);
if (timer->config & HPET_TN_32BIT) {
- /* High 32-bits are zero, leave them untouched. */
- if (shift) {
- trace_hpet_ram_write_invalid_tn_cmp();
- break;
+ new_val = (uint32_t)new_val;
+ }
+ if (!timer_is_periodic(timer)
+ || (timer->config & HPET_TN_SETVAL)) {
+ timer->cmp = (timer->cmp & 0xffffffff00000000ULL) | new_val;
+ }
+ if (timer_is_periodic(timer)) {
+ /*
+ * FIXME: Clamp period to reasonable min value?
+ * Clamp period to reasonable max value
+ */
+ if (timer->config & HPET_TN_32BIT) {
+ new_val = MIN(new_val, ~0u >> 1);
}
- len = 64;
- value = (uint32_t) value;
+ timer->period =
+ (timer->period & 0xffffffff00000000ULL) | new_val;
+ }
+ /*
+ * FIXME: on a 64-bit write, HPET_TN_SETVAL should apply to the
+ * high bits part as well.
+ */
+ timer->config &= ~HPET_TN_SETVAL;
+ if (hpet_enabled(s)) {
+ hpet_set_timer(timer);
}
- trace_hpet_ram_write_tn_cmp(addr & 4);
+ break;
+ case HPET_TN_CMP + 4: // comparator register high order
+ if (timer->config & HPET_TN_32BIT) {
+ trace_hpet_ram_write_invalid_tn_cmp();
+ break;
+ }
+ trace_hpet_ram_write_tn_cmp(4);
if (!timer_is_periodic(timer)
|| (timer->config & HPET_TN_SETVAL)) {
- timer->cmp = deposit64(timer->cmp, shift, len, value);
+ timer->cmp = (timer->cmp & 0xffffffffULL) | new_val << 32;
}
if (timer_is_periodic(timer)) {
/*
* FIXME: Clamp period to reasonable min value?
* Clamp period to reasonable max value
*/
- new_val = deposit64(timer->period, shift, len, value);
- timer->period = MIN(new_val, (timer->config & HPET_TN_32BIT ? ~0u : ~0ull) >> 1);
+ new_val = MIN(new_val, ~0u >> 1);
+ timer->period =
+ (timer->period & 0xffffffffULL) | new_val << 32;
}
timer->config &= ~HPET_TN_SETVAL;
if (hpet_enabled(s)) {
@@ -563,7 +607,10 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
}
break;
case HPET_TN_ROUTE:
- timer->fsb = deposit64(timer->fsb, shift, len, value);
+ timer->fsb = (timer->fsb & 0xffffffff00000000ULL) | new_val;
+ break;
+ case HPET_TN_ROUTE + 4:
+ timer->fsb = (new_val << 32) | (timer->fsb & 0xffffffff);
break;
default:
trace_hpet_ram_write_invalid();
@@ -571,14 +618,12 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
}
return;
} else {
- switch (addr & ~4) {
+ switch (addr) {
case HPET_ID:
return;
case HPET_CFG:
- old_val = s->config;
- new_val = deposit64(old_val, shift, len, value);
new_val = hpet_fixup_reg(new_val, old_val, HPET_CFG_WRITE_MASK);
- s->config = new_val;
+ s->config = (s->config & 0xffffffff00000000ULL) | new_val;
if (activating_bit(old_val, new_val, HPET_CFG_ENABLE)) {
/* Enable main counter and interrupt generation. */
s->hpet_offset =
@@ -608,8 +653,10 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
qemu_set_irq(s->irqs[RTC_ISA_IRQ], s->rtc_irq_level);
}
break;
+ case HPET_CFG + 4:
+ trace_hpet_invalid_hpet_cfg(4);
+ break;
case HPET_STATUS:
- new_val = value << shift;
cleared = new_val & s->isr;
for (i = 0; i < s->num_timers; i++) {
if (cleared & (1 << i)) {
@@ -621,7 +668,15 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
if (hpet_enabled(s)) {
trace_hpet_ram_write_counter_write_while_enabled();
}
- s->hpet_counter = deposit64(s->hpet_counter, shift, len, value);
+ s->hpet_counter =
+ (s->hpet_counter & 0xffffffff00000000ULL) | value;
+ trace_hpet_ram_write_counter_written(0, value, s->hpet_counter);
+ break;
+ case HPET_COUNTER + 4:
+ trace_hpet_ram_write_counter_write_while_enabled();
+ s->hpet_counter =
+ (s->hpet_counter & 0xffffffffULL) | (((uint64_t)value) << 32);
+ trace_hpet_ram_write_counter_written(4, value, s->hpet_counter);
break;
default:
trace_hpet_ram_write_invalid();
@@ -635,11 +690,7 @@ static const MemoryRegionOps hpet_ram_ops = {
.write = hpet_ram_write,
.valid = {
.min_access_size = 4,
- .max_access_size = 8,
- },
- .impl = {
- .min_access_size = 4,
- .max_access_size = 8,
+ .max_access_size = 4,
},
.endianness = DEVICE_NATIVE_ENDIAN,
};
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index 5cfc369fba..219747df2f 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -114,7 +114,8 @@ hpet_ram_read_reading_counter(uint8_t reg_off, uint64_t cur_tick) "reading count
hpet_ram_read_invalid(void) "invalid hpet_ram_readl"
hpet_ram_write(uint64_t addr, uint64_t value) "enter hpet_ram_writel at 0x%" PRIx64 " = 0x%" PRIx64
hpet_ram_write_timer_id(uint64_t timer_id) "hpet_ram_writel timer_id = 0x%" PRIx64
-hpet_ram_write_tn_cfg(uint8_t reg_off) "hpet_ram_writel HPET_TN_CFG + %" PRIu8
+hpet_ram_write_tn_cfg(void) "hpet_ram_writel HPET_TN_CFG"
+hpet_ram_write_invalid_tn_cfg(uint8_t reg_off) "invalid HPET_TN_CFG + %" PRIu8 " write"
hpet_ram_write_tn_cmp(uint8_t reg_off) "hpet_ram_writel HPET_TN_CMP + %" PRIu8
hpet_ram_write_invalid_tn_cmp(void) "invalid HPET_TN_CMP + 4 write"
hpet_ram_write_invalid(void) "invalid hpet_ram_writel"

View File

@@ -0,0 +1,64 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 19 Mar 2025 17:31:10 +0100
Subject: [PATCH] Revert "hpet: place read-only bits directly in "new_val""
This reverts commit ba88935b0fac2588b0a739f810b58dfabf7f92c8.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/timer/hpet.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index e1ac877759..b12bbaf10d 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -510,7 +510,7 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
{
int i;
HPETState *s = opaque;
- uint64_t old_val, new_val, cleared;
+ uint64_t old_val, new_val, val;
trace_hpet_ram_write(addr, value);
old_val = hpet_ram_read(opaque, addr, 4);
@@ -536,12 +536,13 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
*/
update_irq(timer, 0);
}
- new_val = hpet_fixup_reg(new_val, old_val, HPET_TN_CFG_WRITE_MASK);
- timer->config = (timer->config & 0xffffffff00000000ULL) | new_val;
+ val = hpet_fixup_reg(new_val, old_val, HPET_TN_CFG_WRITE_MASK);
+ timer->config = (timer->config & 0xffffffff00000000ULL) | val;
if (activating_bit(old_val, new_val, HPET_TN_ENABLE)
&& (s->isr & (1 << timer_id))) {
update_irq(timer, 1);
}
+
if (new_val & HPET_TN_32BIT) {
timer->cmp = (uint32_t)timer->cmp;
timer->period = (uint32_t)timer->period;
@@ -622,8 +623,8 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
case HPET_ID:
return;
case HPET_CFG:
- new_val = hpet_fixup_reg(new_val, old_val, HPET_CFG_WRITE_MASK);
- s->config = (s->config & 0xffffffff00000000ULL) | new_val;
+ val = hpet_fixup_reg(new_val, old_val, HPET_CFG_WRITE_MASK);
+ s->config = (s->config & 0xffffffff00000000ULL) | val;
if (activating_bit(old_val, new_val, HPET_CFG_ENABLE)) {
/* Enable main counter and interrupt generation. */
s->hpet_offset =
@@ -657,9 +658,9 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
trace_hpet_invalid_hpet_cfg(4);
break;
case HPET_STATUS:
- cleared = new_val & s->isr;
+ val = new_val & s->isr;
for (i = 0; i < s->num_timers; i++) {
- if (cleared & (1 << i)) {
+ if (val & (1 << i)) {
update_irq(&s->timer[i], 0);
}
}

View File

@@ -0,0 +1,68 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 19 Mar 2025 17:31:11 +0100
Subject: [PATCH] Revert "hpet: remove unnecessary variable "index""
This reverts commit 5895879aca252f4ebb2d1078eaf836c61ec54e9b.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/timer/hpet.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index b12bbaf10d..6f83d88516 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -437,12 +437,12 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
unsigned size)
{
HPETState *s = opaque;
- uint64_t cur_tick;
+ uint64_t cur_tick, index;
trace_hpet_ram_read(addr);
-
+ index = addr;
/*address range of all TN regs*/
- if (addr >= 0x100 && addr <= 0x3ff) {
+ if (index >= 0x100 && index <= 0x3ff) {
uint8_t timer_id = (addr - 0x100) / 0x20;
HPETTimer *timer = &s->timer[timer_id];
@@ -469,7 +469,7 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
break;
}
} else {
- switch (addr) {
+ switch (index) {
case HPET_ID:
return s->capability;
case HPET_PERIOD:
@@ -510,14 +510,15 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
{
int i;
HPETState *s = opaque;
- uint64_t old_val, new_val, val;
+ uint64_t old_val, new_val, val, index;
trace_hpet_ram_write(addr, value);
+ index = addr;
old_val = hpet_ram_read(opaque, addr, 4);
new_val = value;
/*address range of all TN regs*/
- if (addr >= 0x100 && addr <= 0x3ff) {
+ if (index >= 0x100 && index <= 0x3ff) {
uint8_t timer_id = (addr - 0x100) / 0x20;
HPETTimer *timer = &s->timer[timer_id];
@@ -619,7 +620,7 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
}
return;
} else {
- switch (addr) {
+ switch (index) {
case HPET_ID:
return;
case HPET_CFG:

View File

@@ -0,0 +1,40 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 19 Mar 2025 17:31:12 +0100
Subject: [PATCH] Revert "hpet: ignore high bits of comparator in 32-bit mode"
This reverts commit 9eb7fad3546a89ee7cf0e90f5b1daccf89725cea.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/timer/hpet.c | 4 ----
hw/timer/trace-events | 1 -
2 files changed, 5 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 6f83d88516..509986c0a9 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -585,10 +585,6 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
}
break;
case HPET_TN_CMP + 4: // comparator register high order
- if (timer->config & HPET_TN_32BIT) {
- trace_hpet_ram_write_invalid_tn_cmp();
- break;
- }
trace_hpet_ram_write_tn_cmp(4);
if (!timer_is_periodic(timer)
|| (timer->config & HPET_TN_SETVAL)) {
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index 219747df2f..b86870fb22 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -117,7 +117,6 @@ hpet_ram_write_timer_id(uint64_t timer_id) "hpet_ram_writel timer_id = 0x%" PRIx
hpet_ram_write_tn_cfg(void) "hpet_ram_writel HPET_TN_CFG"
hpet_ram_write_invalid_tn_cfg(uint8_t reg_off) "invalid HPET_TN_CFG + %" PRIu8 " write"
hpet_ram_write_tn_cmp(uint8_t reg_off) "hpet_ram_writel HPET_TN_CMP + %" PRIu8
-hpet_ram_write_invalid_tn_cmp(void) "invalid HPET_TN_CMP + 4 write"
hpet_ram_write_invalid(void) "invalid hpet_ram_writel"
hpet_ram_write_counter_write_while_enabled(void) "Writing counter while HPET enabled!"
hpet_ram_write_counter_written(uint8_t reg_off, uint64_t value, uint64_t counter) "HPET counter + %" PRIu8 "written. crt = 0x%" PRIx64 " -> 0x%" PRIx64

View File

@@ -0,0 +1,120 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 19 Mar 2025 17:31:13 +0100
Subject: [PATCH] Revert "hpet: fix and cleanup persistence of interrupt
status"
This reverts commit f0ccf770789e48b7a73497b465fdc892d28c1339.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/timer/hpet.c | 60 ++++++++++++++++---------------------------------
1 file changed, 19 insertions(+), 41 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 509986c0a9..402cc960f0 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -196,31 +196,21 @@ static void update_irq(struct HPETTimer *timer, int set)
}
s = timer->state;
mask = 1 << timer->tn;
-
- if (set && (timer->config & HPET_TN_TYPE_LEVEL)) {
- /*
- * If HPET_TN_ENABLE bit is 0, "the timer will still operate and
- * generate appropriate status bits, but will not cause an interrupt"
- */
- s->isr |= mask;
- } else {
+ if (!set || !timer_enabled(timer) || !hpet_enabled(timer->state)) {
s->isr &= ~mask;
- }
-
- if (set && timer_enabled(timer) && hpet_enabled(s)) {
- if (timer_fsb_route(timer)) {
- address_space_stl_le(&address_space_memory, timer->fsb >> 32,
- timer->fsb & 0xffffffff, MEMTXATTRS_UNSPECIFIED,
- NULL);
- } else if (timer->config & HPET_TN_TYPE_LEVEL) {
- qemu_irq_raise(s->irqs[route]);
- } else {
- qemu_irq_pulse(s->irqs[route]);
- }
- } else {
if (!timer_fsb_route(timer)) {
qemu_irq_lower(s->irqs[route]);
}
+ } else if (timer_fsb_route(timer)) {
+ address_space_stl_le(&address_space_memory, timer->fsb >> 32,
+ timer->fsb & 0xffffffff, MEMTXATTRS_UNSPECIFIED,
+ NULL);
+ } else if (timer->config & HPET_TN_TYPE_LEVEL) {
+ s->isr |= mask;
+ qemu_irq_raise(s->irqs[route]);
+ } else {
+ s->isr &= ~mask;
+ qemu_irq_pulse(s->irqs[route]);
}
}
@@ -424,13 +414,8 @@ static void hpet_set_timer(HPETTimer *t)
static void hpet_del_timer(HPETTimer *t)
{
- HPETState *s = t->state;
timer_del(t->qemu_timer);
-
- if (s->isr & (1 << t->tn)) {
- /* For level-triggered interrupt, this leaves ISR set but lowers irq. */
- update_irq(t, 1);
- }
+ update_irq(t, 0);
}
static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
@@ -530,26 +515,20 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
switch ((addr - 0x100) % 0x20) {
case HPET_TN_CFG:
trace_hpet_ram_write_tn_cfg();
- if (deactivating_bit(old_val, new_val, HPET_TN_TYPE_LEVEL)) {
- /*
- * Do this before changing timer->config; otherwise, if
- * HPET_TN_FSB is set, update_irq will not lower the qemu_irq.
- */
+ if (activating_bit(old_val, new_val, HPET_TN_FSB_ENABLE)) {
update_irq(timer, 0);
}
val = hpet_fixup_reg(new_val, old_val, HPET_TN_CFG_WRITE_MASK);
timer->config = (timer->config & 0xffffffff00000000ULL) | val;
- if (activating_bit(old_val, new_val, HPET_TN_ENABLE)
- && (s->isr & (1 << timer_id))) {
- update_irq(timer, 1);
- }
-
if (new_val & HPET_TN_32BIT) {
timer->cmp = (uint32_t)timer->cmp;
timer->period = (uint32_t)timer->period;
}
- if (hpet_enabled(s)) {
+ if (activating_bit(old_val, new_val, HPET_TN_ENABLE) &&
+ hpet_enabled(s)) {
hpet_set_timer(timer);
+ } else if (deactivating_bit(old_val, new_val, HPET_TN_ENABLE)) {
+ hpet_del_timer(timer);
}
break;
case HPET_TN_CFG + 4: // Interrupt capabilities
@@ -627,10 +606,9 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
s->hpet_offset =
ticks_to_ns(s->hpet_counter) - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
for (i = 0; i < s->num_timers; i++) {
- if (timer_enabled(&s->timer[i]) && (s->isr & (1 << i))) {
- update_irq(&s->timer[i], 1);
+ if ((&s->timer[i])->cmp != ~0ULL) {
+ hpet_set_timer(&s->timer[i]);
}
- hpet_set_timer(&s->timer[i]);
}
} else if (deactivating_bit(old_val, new_val, HPET_CFG_ENABLE)) {
/* Halt main counter and disable interrupt generation. */

View File

@@ -0,0 +1,81 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 31 Mar 2025 16:55:02 +0200
Subject: [PATCH] savevm-async: improve setting state of snapshot operation in
savevm-end handler
One of the callers of wait_for_close_co() already sets the state to
SAVE_STATE_DONE before, but that is not fully correct, because at that
moment, the operation is not fully done. In particular, if closing the
target later fails, the state would even be set to SAVE_STATE_ERROR
afterwards. DONE -> ERROR is not a valid state transition. Although,
it should not matter in practice as long as the relevant QMP commands
are sequential.
The other caller does not set the state and so there seems to be a
race that could lead to the state not getting set at all. This is
because before this commit, the wait_for_close_co() function could
return early when there is no target file anymore. This can only
happen when canceling and needs to happen right around the time when
the snapshot is already finishing and closing the target.
Simply avoid the early return and always set the state within the
wait_for_close_co() function rather than at the call site.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
migration/savevm-async.c | 33 +++++++++++++++------------------
1 file changed, 15 insertions(+), 18 deletions(-)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 1e79fce9ba..e63dc6d8a3 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -450,23 +450,22 @@ static void coroutine_fn wait_for_close_co(void *opaque)
{
int64_t timeout;
- if (!snap_state.target) {
- DPRINTF("savevm-end: no target file open\n");
- return;
- }
-
- /* wait until cleanup is done before returning, this ensures that after this
- * call exits the statefile will be closed and can be removed immediately */
- DPRINTF("savevm-end: waiting for cleanup\n");
- timeout = 30L * 1000 * 1000 * 1000;
- qemu_co_sleep_ns_wakeable(&snap_state.target_close_wait,
- QEMU_CLOCK_REALTIME, timeout);
if (snap_state.target) {
- save_snapshot_error("timeout waiting for target file close in "
- "qmp_savevm_end");
- /* we cannot assume the snapshot finished in this case, so leave the
- * state alone - caller has to figure something out */
- return;
+ /* wait until cleanup is done before returning, this ensures that after this
+ * call exits the statefile will be closed and can be removed immediately */
+ DPRINTF("savevm-end: waiting for cleanup\n");
+ timeout = 30L * 1000 * 1000 * 1000;
+ qemu_co_sleep_ns_wakeable(&snap_state.target_close_wait,
+ QEMU_CLOCK_REALTIME, timeout);
+ if (snap_state.target) {
+ save_snapshot_error("timeout waiting for target file close in "
+ "qmp_savevm_end");
+ /* we cannot assume the snapshot finished in this case, so leave the
+ * state alone - caller has to figure something out */
+ return;
+ }
+ } else {
+ DPRINTF("savevm-end: no target file open\n");
}
// File closed and no other error, so ensure next snapshot can be started.
@@ -497,8 +496,6 @@ void qmp_savevm_end(Error **errp)
snap_state.saved_vm_running = false;
}
- snap_state.state = SAVE_STATE_DONE;
-
qemu_coroutine_enter(wait_for_close);
}

View File

@@ -0,0 +1,71 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 31 Mar 2025 16:55:03 +0200
Subject: [PATCH] savevm-async: rename saved_vm_running to vm_needs_start
This is what the variable actually expresses. Otherwise, setting it
to false after starting the VM doesn't make sense.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
migration/savevm-async.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index e63dc6d8a3..1e34c31e8b 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -51,7 +51,7 @@ static struct SnapshotState {
int state;
Error *error;
Error *blocker;
- int saved_vm_running;
+ int vm_needs_start;
QEMUFile *file;
int64_t total_time;
QEMUBH *finalize_bh;
@@ -224,9 +224,9 @@ static void process_savevm_finalize(void *opaque)
save_snapshot_error("process_savevm_cleanup: invalid state: %d",
snap_state.state);
}
- if (snap_state.saved_vm_running) {
+ if (snap_state.vm_needs_start) {
vm_start();
- snap_state.saved_vm_running = false;
+ snap_state.vm_needs_start = false;
}
DPRINTF("timing: process_savevm_finalize (full) took %ld ms\n",
@@ -352,7 +352,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
}
/* initialize snapshot info */
- snap_state.saved_vm_running = runstate_is_running();
+ snap_state.vm_needs_start = runstate_is_running();
snap_state.bs_pos = 0;
snap_state.total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
snap_state.blocker = NULL;
@@ -440,9 +440,9 @@ restart:
save_snapshot_error("setup failed");
- if (snap_state.saved_vm_running) {
+ if (snap_state.vm_needs_start) {
vm_start();
- snap_state.saved_vm_running = false;
+ snap_state.vm_needs_start = false;
}
}
@@ -491,9 +491,9 @@ void qmp_savevm_end(Error **errp)
return;
}
- if (snap_state.saved_vm_running) {
+ if (snap_state.vm_needs_start) {
vm_start();
- snap_state.saved_vm_running = false;
+ snap_state.vm_needs_start = false;
}
qemu_coroutine_enter(wait_for_close);

View File

@@ -0,0 +1,120 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 31 Mar 2025 16:55:04 +0200
Subject: [PATCH] savevm-async: improve runstate preservation, cleanup error
handling
Determine if VM needs to be started after finishing right before
actually stopping the VM instead of at the beginning.
In qmp_savevm_start(), the only path stopping the VM returns right
aftwards, so there is no need for the vm_start() handling after
errors.
Lastly, improve the code style for checking whether migrate_init()
failed by explicitly comparing against 0.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[WB: squashed error handling commits, rename goto branch instead of
inlining it]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
migration/savevm-async.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 1e34c31e8b..d8d2c80475 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -178,6 +178,7 @@ static void process_savevm_finalize(void *opaque)
*/
blk_set_aio_context(snap_state.target, qemu_get_aio_context(), NULL);
+ snap_state.vm_needs_start = runstate_is_running();
ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
if (ret < 0) {
save_snapshot_error("vm_stop_force_state error %d", ret);
@@ -352,7 +353,6 @@ void qmp_savevm_start(const char *statefile, Error **errp)
}
/* initialize snapshot info */
- snap_state.vm_needs_start = runstate_is_running();
snap_state.bs_pos = 0;
snap_state.total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
snap_state.blocker = NULL;
@@ -364,13 +364,14 @@ void qmp_savevm_start(const char *statefile, Error **errp)
}
if (!statefile) {
+ snap_state.vm_needs_start = runstate_is_running();
vm_stop(RUN_STATE_SAVE_VM);
snap_state.state = SAVE_STATE_COMPLETED;
return;
}
if (qemu_savevm_state_blocked(errp)) {
- return;
+ goto fail;
}
/* Open the image */
@@ -380,12 +381,12 @@ void qmp_savevm_start(const char *statefile, Error **errp)
snap_state.target = blk_new_open(statefile, NULL, options, bdrv_oflags, &local_err);
if (!snap_state.target) {
error_setg(errp, "failed to open '%s'", statefile);
- goto restart;
+ goto fail;
}
target_bs = blk_bs(snap_state.target);
if (!target_bs) {
error_setg(errp, "failed to open '%s' - no block driver state", statefile);
- goto restart;
+ goto fail;
}
QIOChannel *ioc = QIO_CHANNEL(qio_channel_savevm_async_new(snap_state.target,
@@ -394,7 +395,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
if (!snap_state.file) {
error_setg(errp, "failed to open '%s'", statefile);
- goto restart;
+ goto fail;
}
/*
@@ -402,8 +403,8 @@ void qmp_savevm_start(const char *statefile, Error **errp)
* State is cleared in process_savevm_co, but has to be initialized
* here (blocking main thread, from QMP) to avoid race conditions.
*/
- if (migrate_init(ms, errp)) {
- return;
+ if (migrate_init(ms, errp) != 0) {
+ goto fail;
}
memset(&mig_stats, 0, sizeof(mig_stats));
ms->to_dst_file = snap_state.file;
@@ -418,7 +419,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
if (ret != 0) {
error_setg_errno(errp, -ret, "savevm state setup failed: %s",
local_err ? error_get_pretty(local_err) : "unknown error");
- return;
+ goto fail;
}
/* Async processing from here on out happens in iohandler context, so let
@@ -436,14 +437,8 @@ void qmp_savevm_start(const char *statefile, Error **errp)
return;
-restart:
-
+fail:
save_snapshot_error("setup failed");
-
- if (snap_state.vm_needs_start) {
- vm_start();
- snap_state.vm_needs_start = false;
- }
}
static void coroutine_fn wait_for_close_co(void *opaque)

View File

@@ -0,0 +1,185 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 31 Mar 2025 16:55:06 +0200
Subject: [PATCH] savevm-async: use dedicated iothread for state file
Having the state file be in the iohandler context means that a
blk_drain_all() call in the main thread or vCPU thread that happens
while the snapshot is running will result in a deadlock.
For example, the main thread might be stuck in:
> 0 0x00007300ac9552d6 in __ppoll (fds=0x64bd5a411a50, nfds=2, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
> 1 0x000064bd51af3cad in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:64
> 2 0x000064bd51ad8799 in fdmon_poll_wait (ctx=0x64bd58d968a0, ready_list=0x7ffcfcc15558, timeout=-1) at ../util/fdmon-poll.c:79
> 3 0x000064bd51ad7c3d in aio_poll (ctx=0x64bd58d968a0, blocking=blocking@entry=true) at ../util/aio-posix.c:671
> 4 0x000064bd519a0b5d in bdrv_drain_all_begin () at ../block/io.c:531
> 5 bdrv_drain_all_begin () at ../block/io.c:510
> 6 0x000064bd519943c4 in blk_drain_all () at ../block/block-backend.c:2085
> 7 0x000064bd5160fc5a in virtio_scsi_dataplane_stop (vdev=0x64bd5a215190) at ../hw/scsi/virtio-scsi-dataplane.c:213
> 8 0x000064bd51664e90 in virtio_bus_stop_ioeventfd (bus=0x64bd5a215110) at ../hw/virtio/virtio-bus.c:259
> 9 0x000064bd5166511b in virtio_bus_stop_ioeventfd (bus=<optimized out>) at ../hw/virtio/virtio-bus.c:251
> 10 virtio_bus_reset (bus=<optimized out>) at ../hw/virtio/virtio-bus.c:107
> 11 0x000064bd51667431 in virtio_pci_reset (qdev=<optimized out>) at ../hw/virtio/virtio-pci.c:2296
...
> 34 0x000064bd517aa951 in pc_machine_reset (machine=<optimized out>, type=<optimized out>) at ../hw/i386/pc.c:1722
> 35 0x000064bd516aa4c4 in qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_GUEST_RESET) at ../system/runstate.c:525
> 36 0x000064bd516aaeb9 in main_loop_should_exit (status=<synthetic pointer>) at ../system/runstate.c:801
> 37 qemu_main_loop () at ../system/runstate.c:834
which is in block/io.c:
> /* Now poll the in-flight requests */
> AIO_WAIT_WHILE_UNLOCKED(NULL, bdrv_drain_all_poll());
The working theory is: The deadlock happens because the IO is issued
from the process_savevm_co() coroutine, which doesn't get scheduled
again to complete in-flight requests when the main thread is stuck
there polling. The main thread itself is the one that would need to
schedule it. In case of a vCPU triggering the VirtIO SCSI dataplane
stop, which happens during (Linux) boot, the vCPU thread will hold the
big QEMU lock (BQL) blocking the main thread from making progress
scheduling the process_savevm_co() coroutine.
This change should also help in general to reduce load on the main
thread and for it to get stuck on IO, i.e. same benefits as using a
dedicated IO thread for regular drives. This is particularly
interesting when the VM state storage is a network storage like NFS.
With some luck, it could also help with bug #6262 [0]. The failure
there happens while issuing/right after the savevm-start QMP command,
so the most likely coroutine is the process_savevm_co() that was
previously scheduled to the iohandler context. Likely someone polls
the iohandler context and wants to enter the already scheduled
coroutine leading to the abort():
> qemu_aio_coroutine_enter: Co-routine was already scheduled in 'aio_co_schedule'
With a dedicated iothread, there hopefully is no such race.
The comment above querying the pending bytes wrongly talked about the
"iothread lock", but should've been "iohandler lock". This was even
renamed to BQL (big QEMU lock) a few releases ago. Even if that was
not a typo to begin with, there are no AioContext locks anymore.
[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=6262
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[WB: update to the changed error handling in the previous commit]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
migration/savevm-async.c | 42 ++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index d8d2c80475..11ea4c601d 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -25,6 +25,7 @@
#include "qemu/main-loop.h"
#include "qemu/rcu.h"
#include "qemu/yank.h"
+#include "sysemu/iothread.h"
/* #define DEBUG_SAVEVM_STATE */
@@ -57,6 +58,7 @@ static struct SnapshotState {
QEMUBH *finalize_bh;
Coroutine *co;
QemuCoSleep target_close_wait;
+ IOThread *iothread;
} snap_state;
static bool savevm_aborted(void)
@@ -256,16 +258,13 @@ static void coroutine_fn process_savevm_co(void *opaque)
uint64_t threshold = 400 * 1000;
/*
- * pending_{estimate,exact} are expected to be called without iothread
- * lock. Similar to what is done in migration.c, call the exact variant
- * only once pend_precopy in the estimate is below the threshold.
+ * Similar to what is done in migration.c, call the exact variant only
+ * once pend_precopy in the estimate is below the threshold.
*/
- bql_unlock();
qemu_savevm_state_pending_estimate(&pend_precopy, &pend_postcopy);
if (pend_precopy <= threshold) {
qemu_savevm_state_pending_exact(&pend_precopy, &pend_postcopy);
}
- bql_lock();
pending_size = pend_precopy + pend_postcopy;
/*
@@ -332,11 +331,17 @@ static void coroutine_fn process_savevm_co(void *opaque)
qemu_bh_schedule(snap_state.finalize_bh);
}
+static void savevm_cleanup_iothread(void) {
+ if (snap_state.iothread) {
+ iothread_destroy(snap_state.iothread);
+ snap_state.iothread = NULL;
+ }
+}
+
void qmp_savevm_start(const char *statefile, Error **errp)
{
Error *local_err = NULL;
MigrationState *ms = migrate_get_current();
- AioContext *iohandler_ctx = iohandler_get_aio_context();
BlockDriverState *target_bs = NULL;
int ret = 0;
@@ -374,6 +379,19 @@ void qmp_savevm_start(const char *statefile, Error **errp)
goto fail;
}
+ if (snap_state.iothread) {
+ /* This is not expected, so warn about it, but no point in re-creating a new iothread. */
+ warn_report("iothread for snapshot already exists - re-using");
+ } else {
+ snap_state.iothread =
+ iothread_create("__proxmox_savevm_async_iothread__", &local_err);
+ if (!snap_state.iothread) {
+ error_setg(errp, "creating iothread failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ goto fail;
+ }
+ }
+
/* Open the image */
QDict *options = NULL;
options = qdict_new();
@@ -422,22 +440,20 @@ void qmp_savevm_start(const char *statefile, Error **errp)
goto fail;
}
- /* Async processing from here on out happens in iohandler context, so let
- * the target bdrv have its home there.
- */
- ret = blk_set_aio_context(snap_state.target, iohandler_ctx, &local_err);
+ ret = blk_set_aio_context(snap_state.target, snap_state.iothread->ctx, &local_err);
if (ret != 0) {
- warn_report("failed to set iohandler context for VM state target: %s %s",
+ warn_report("failed to set iothread context for VM state target: %s %s",
local_err ? error_get_pretty(local_err) : "unknown error",
strerror(-ret));
}
snap_state.co = qemu_coroutine_create(&process_savevm_co, NULL);
- aio_co_schedule(iohandler_ctx, snap_state.co);
+ aio_co_schedule(snap_state.iothread->ctx, snap_state.co);
return;
fail:
+ savevm_cleanup_iothread();
save_snapshot_error("setup failed");
}
@@ -463,6 +479,8 @@ static void coroutine_fn wait_for_close_co(void *opaque)
DPRINTF("savevm-end: no target file open\n");
}
+ savevm_cleanup_iothread();
+
// File closed and no other error, so ensure next snapshot can be started.
if (snap_state.state != SAVE_STATE_ERROR) {
snap_state.state = SAVE_STATE_DONE;

View File

@@ -0,0 +1,33 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 31 Mar 2025 16:55:07 +0200
Subject: [PATCH] savevm-async: treat failure to set iothread context as a hard
failure
This is not expected to ever fail and there might be assumptions about
having the expected context down the line.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[WB: update to changed error handling]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
migration/savevm-async.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 11ea4c601d..f2b10b5519 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -442,9 +442,9 @@ void qmp_savevm_start(const char *statefile, Error **errp)
ret = blk_set_aio_context(snap_state.target, snap_state.iothread->ctx, &local_err);
if (ret != 0) {
- warn_report("failed to set iothread context for VM state target: %s %s",
- local_err ? error_get_pretty(local_err) : "unknown error",
- strerror(-ret));
+ error_setg_errno(errp, -ret, "failed to set iothread context for VM state target: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ goto fail;
}
snap_state.co = qemu_coroutine_create(&process_savevm_co, NULL);

View File

@@ -0,0 +1,41 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:41 +0200
Subject: [PATCH] PVE backup: clean up directly in setup_snapshot_access() when
it fails
The only thing that might need to be cleaned up after
setup_snapshot_access() failed is dropping the cbw filter. Do so in
the single branch it matters inside setup_snapshot_access() itself.
This avoids the need that callers of setup_snapshot_access() use
cleanup_snapshot_access() when the call failed.
Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/pve-backup.c b/pve-backup.c
index 32352fb5ec..2408f182bc 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -576,6 +576,9 @@ static int setup_snapshot_access(PVEBackupDevInfo *di, Error **errp)
di->fleecing.snapshot_access =
bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | BDRV_O_UNMAP, &local_err);
if (!di->fleecing.snapshot_access) {
+ bdrv_cbw_drop(di->fleecing.cbw);
+ di->fleecing.cbw = NULL;
+
error_setg(errp, "setting up snapshot access for fleecing failed: %s",
local_err ? error_get_pretty(local_err) : "unknown error");
return -1;
@@ -629,7 +632,6 @@ static void create_backup_jobs_bh(void *opaque) {
error_setg(errp, "%s - setting up snapshot access for fleecing failed: %s",
di->device_name,
local_err ? error_get_pretty(local_err) : "unknown error");
- cleanup_snapshot_access(di);
bdrv_drained_end(di->bs);
break;
}

View File

@@ -0,0 +1,59 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:42 +0200
Subject: [PATCH] PVE backup: factor out helper to clear backup state's bitmap
list
Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 2408f182bc..915649b5f9 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -811,6 +811,23 @@ err:
return di_list;
}
+/*
+ * To be called with the backup_state.stat mutex held.
+ */
+static void clear_backup_state_bitmap_list(void) {
+
+ if (backup_state.stat.bitmap_list) {
+ GList *bl = backup_state.stat.bitmap_list;
+ while (bl) {
+ g_free(((PBSBitmapInfo *)bl->data)->drive);
+ g_free(bl->data);
+ bl = g_list_next(bl);
+ }
+ g_list_free(backup_state.stat.bitmap_list);
+ backup_state.stat.bitmap_list = NULL;
+ }
+}
+
UuidInfo coroutine_fn *qmp_backup(
const char *backup_file,
const char *password,
@@ -898,16 +915,7 @@ UuidInfo coroutine_fn *qmp_backup(
backup_state.stat.reused = 0;
/* clear previous backup's bitmap_list */
- if (backup_state.stat.bitmap_list) {
- GList *bl = backup_state.stat.bitmap_list;
- while (bl) {
- g_free(((PBSBitmapInfo *)bl->data)->drive);
- g_free(bl->data);
- bl = g_list_next(bl);
- }
- g_list_free(backup_state.stat.bitmap_list);
- backup_state.stat.bitmap_list = NULL;
- }
+ clear_backup_state_bitmap_list();
if (format == BACKUP_FORMAT_PBS) {
if (!password) {

View File

@@ -0,0 +1,95 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:43 +0200
Subject: [PATCH] PVE backup: factor out helper to initialize backup state stat
struct
Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 62 ++++++++++++++++++++++++++++++++--------------------
1 file changed, 38 insertions(+), 24 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 915649b5f9..88a981f81c 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -828,6 +828,43 @@ static void clear_backup_state_bitmap_list(void) {
}
}
+/*
+ * Initializes most of the backup state 'stat' struct. Note that 'reused' and
+ * 'bitmap_list' are not changed by this function and need to be handled by
+ * the caller. In particular, 'reused' needs to be set before calling this
+ * function.
+ *
+ * To be called with the backup_state.stat mutex held.
+ */
+static void initialize_backup_state_stat(
+ const char *backup_file,
+ uuid_t uuid,
+ size_t total)
+{
+ if (backup_state.stat.error) {
+ error_free(backup_state.stat.error);
+ backup_state.stat.error = NULL;
+ }
+
+ backup_state.stat.start_time = time(NULL);
+ backup_state.stat.end_time = 0;
+
+ if (backup_state.stat.backup_file) {
+ g_free(backup_state.stat.backup_file);
+ }
+ backup_state.stat.backup_file = g_strdup(backup_file);
+
+ uuid_copy(backup_state.stat.uuid, uuid);
+ uuid_unparse_lower(uuid, backup_state.stat.uuid_str);
+
+ backup_state.stat.total = total;
+ backup_state.stat.dirty = total - backup_state.stat.reused;
+ backup_state.stat.transferred = 0;
+ backup_state.stat.zero_bytes = 0;
+ backup_state.stat.finishing = false;
+ backup_state.stat.starting = true;
+}
+
UuidInfo coroutine_fn *qmp_backup(
const char *backup_file,
const char *password,
@@ -1070,32 +1107,9 @@ UuidInfo coroutine_fn *qmp_backup(
}
}
/* initialize global backup_state now */
- /* note: 'reused' and 'bitmap_list' are initialized earlier */
-
- if (backup_state.stat.error) {
- error_free(backup_state.stat.error);
- backup_state.stat.error = NULL;
- }
-
- backup_state.stat.start_time = time(NULL);
- backup_state.stat.end_time = 0;
-
- if (backup_state.stat.backup_file) {
- g_free(backup_state.stat.backup_file);
- }
- backup_state.stat.backup_file = g_strdup(backup_file);
-
- uuid_copy(backup_state.stat.uuid, uuid);
- uuid_unparse_lower(uuid, backup_state.stat.uuid_str);
+ initialize_backup_state_stat(backup_file, uuid, total);
char *uuid_str = g_strdup(backup_state.stat.uuid_str);
- backup_state.stat.total = total;
- backup_state.stat.dirty = total - backup_state.stat.reused;
- backup_state.stat.transferred = 0;
- backup_state.stat.zero_bytes = 0;
- backup_state.stat.finishing = false;
- backup_state.stat.starting = true;
-
qemu_mutex_unlock(&backup_state.stat.lock);
backup_state.speed = (has_speed && speed > 0) ? speed : 0;

View File

@@ -0,0 +1,63 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:44 +0200
Subject: [PATCH] PVE backup: add target ID in backup state
In preparation for allowing multiple backup providers and potentially
multiple targets for a given provider. Each backup target can then
have its own dirty bitmap and there can be additional checks that the
current backup state is actually associated to the expected target.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/pve-backup.c b/pve-backup.c
index 88a981f81c..8789a0667a 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -70,6 +70,7 @@ static struct PVEBackupState {
JobTxn *txn;
CoMutex backup_mutex;
CoMutex dump_callback_mutex;
+ char *target_id;
} backup_state;
static void pvebackup_init(void)
@@ -865,6 +866,16 @@ static void initialize_backup_state_stat(
backup_state.stat.starting = true;
}
+/*
+ * To be called with the backup_state mutex held.
+ */
+static void backup_state_set_target_id(const char *target_id) {
+ if (backup_state.target_id) {
+ g_free(backup_state.target_id);
+ }
+ backup_state.target_id = g_strdup(target_id);
+}
+
UuidInfo coroutine_fn *qmp_backup(
const char *backup_file,
const char *password,
@@ -904,7 +915,7 @@ UuidInfo coroutine_fn *qmp_backup(
if (backup_state.di_list) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR,
- "previous backup not finished");
+ "previous backup for target '%s' not finished", backup_state.target_id);
qemu_co_mutex_unlock(&backup_state.backup_mutex);
return NULL;
}
@@ -1122,6 +1133,8 @@ UuidInfo coroutine_fn *qmp_backup(
backup_state.vmaw = vmaw;
backup_state.pbs = pbs;
+ backup_state_set_target_id("Proxmox");
+
backup_state.di_list = di_list;
uuid_info = g_malloc0(sizeof(*uuid_info));

View File

@@ -0,0 +1,57 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:45 +0200
Subject: [PATCH] PVE backup: get device info: allow caller to specify filter
for which devices use fleecing
For providing snapshot-access to external backup providers, EFI and
TPM also need an associated fleecing image. The new caller will thus
need a different filter.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 8789a0667a..755f1abcf1 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -719,7 +719,7 @@ static void create_backup_jobs_bh(void *opaque) {
/*
* EFI disk and TPM state are small and it's just not worth setting up fleecing for them.
*/
-static bool device_uses_fleecing(const char *device_id)
+static bool fleecing_no_efi_tpm(const char *device_id)
{
return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14);
}
@@ -731,7 +731,7 @@ static bool device_uses_fleecing(const char *device_id)
*/
static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
const char *devlist,
- bool fleecing,
+ bool (*device_uses_fleecing)(const char*),
Error **errp)
{
gchar **devs = NULL;
@@ -757,7 +757,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
di->bs = bs;
di->device_name = g_strdup(bdrv_get_device_name(bs));
- if (fleecing && device_uses_fleecing(*d)) {
+ if (device_uses_fleecing && device_uses_fleecing(*d)) {
g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
if (!fleecing_blk) {
@@ -924,7 +924,8 @@ UuidInfo coroutine_fn *qmp_backup(
format = has_format ? format : BACKUP_FORMAT_VMA;
bdrv_graph_co_rdlock();
- di_list = get_device_info(devlist, has_fleecing && fleecing, &local_err);
+ di_list = get_device_info(devlist, (has_fleecing && fleecing) ? fleecing_no_efi_tpm : NULL,
+ &local_err);
bdrv_graph_co_rdunlock();
if (local_err) {
error_propagate(errp, local_err);

View File

@@ -0,0 +1,495 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:46 +0200
Subject: [PATCH] PVE backup: implement backup access setup and teardown API
for external providers
For external backup providers, the state of the VM's disk images at
the time the backup is started is preserved via a snapshot-access
block node. Old data is moved to the fleecing image when new guest
writes come in. The snapshot-access block node, as well as the
associated bitmap in case of incremental backup, will be exported via
NBD to the external provider. The NBD export will be done by the
management layer, the missing functionality is setting up and tearing
down the snapshot-access block nodes, which this patch adds.
It is necessary to also set up fleecing for EFI and TPM disks, so that
old data can be moved out of the way when a new guest write comes in.
There can only be one regular backup or one active backup access at
a time, because both require replacing the original block node of the
drive. Thus the backup state is re-used, and checks are added to
prohibit regular backup while snapshot access is active and vice
versa.
The block nodes added by the backup-access-setup QMP call are not
tracked anywhere else (there is no job they are associated to like for
regular backup). This requires adding a callback for teardown when
QEMU exits, i.e. in qemu_cleanup(). Otherwise, there will be an
assertion failure that the block graph is not empty when QEMU exits
before the backup-access-teardown QMP command is called.
The code for the qmp_backup_access_setup() was based on the existing
qmp_backup() routine.
The return value for the setup QMP command contains information about
the snapshot-access block nodes that can be used by the management
layer to set up the NBD exports.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 264 ++++++++++++++++++++++++++++++++++++++++++-
pve-backup.h | 16 +++
qapi/block-core.json | 49 ++++++++
system/runstate.c | 6 +
4 files changed, 329 insertions(+), 6 deletions(-)
create mode 100644 pve-backup.h
diff --git a/pve-backup.c b/pve-backup.c
index 755f1abcf1..091b5bd231 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1,4 +1,5 @@
#include "proxmox-backup-client.h"
+#include "pve-backup.h"
#include "vma.h"
#include "qemu/osdep.h"
@@ -588,6 +589,36 @@ static int setup_snapshot_access(PVEBackupDevInfo *di, Error **errp)
return 0;
}
+static void setup_all_snapshot_access_bh(void *opaque)
+{
+ assert(!qemu_in_coroutine());
+
+ CoCtxData *data = (CoCtxData*)opaque;
+ Error **errp = (Error**)data->data;
+
+ Error *local_err = NULL;
+
+ GList *l = backup_state.di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ bdrv_drained_begin(di->bs);
+
+ if (setup_snapshot_access(di, &local_err) < 0) {
+ bdrv_drained_end(di->bs);
+ error_setg(errp, "%s - setting up snapshot access failed: %s", di->device_name,
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ break;
+ }
+
+ bdrv_drained_end(di->bs);
+ }
+
+ /* return */
+ aio_co_enter(data->ctx, data->co);
+}
+
/*
* backup_job_create can *not* be run from a coroutine, so this can't either.
* The caller is responsible that backup_mutex is held nonetheless.
@@ -724,6 +755,11 @@ static bool fleecing_no_efi_tpm(const char *device_id)
return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14);
}
+static bool fleecing_all(const char *device_id)
+{
+ return true;
+}
+
/*
* Returns a list of device infos, which needs to be freed by the caller. In
* case of an error, errp will be set, but the returned value might still be a
@@ -839,8 +875,9 @@ static void clear_backup_state_bitmap_list(void) {
*/
static void initialize_backup_state_stat(
const char *backup_file,
- uuid_t uuid,
- size_t total)
+ uuid_t *uuid,
+ size_t total,
+ bool starting)
{
if (backup_state.stat.error) {
error_free(backup_state.stat.error);
@@ -855,15 +892,19 @@ static void initialize_backup_state_stat(
}
backup_state.stat.backup_file = g_strdup(backup_file);
- uuid_copy(backup_state.stat.uuid, uuid);
- uuid_unparse_lower(uuid, backup_state.stat.uuid_str);
+ if (uuid) {
+ uuid_copy(backup_state.stat.uuid, *uuid);
+ uuid_unparse_lower(*uuid, backup_state.stat.uuid_str);
+ } else {
+ backup_state.stat.uuid_str[0] = '\0';
+ }
backup_state.stat.total = total;
backup_state.stat.dirty = total - backup_state.stat.reused;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
backup_state.stat.finishing = false;
- backup_state.stat.starting = true;
+ backup_state.stat.starting = starting;
}
/*
@@ -876,6 +917,216 @@ static void backup_state_set_target_id(const char *target_id) {
backup_state.target_id = g_strdup(target_id);
}
+BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
+ const char *target_id,
+ const char *devlist,
+ Error **errp)
+{
+ assert(qemu_in_coroutine());
+
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
+
+ Error *local_err = NULL;
+ GList *di_list = NULL;
+ GList *l;
+
+ if (backup_state.di_list) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "previous backup for target '%s' not finished", backup_state.target_id);
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ return NULL;
+ }
+
+ bdrv_graph_co_rdlock();
+ di_list = get_device_info(devlist, fleecing_all, &local_err);
+ bdrv_graph_co_rdunlock();
+ if (local_err) {
+ error_propagate(errp, local_err);
+ goto err;
+ }
+ assert(di_list);
+
+ size_t total = 0;
+
+ l = di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ ssize_t size = bdrv_getlength(di->bs);
+ if (size < 0) {
+ error_setg_errno(errp, -size, "bdrv_getlength failed");
+ goto err;
+ }
+ di->size = size;
+ total += size;
+
+ di->completed_ret = INT_MAX;
+ }
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.reused = 0;
+
+ /* clear previous backup's bitmap_list */
+ clear_backup_state_bitmap_list();
+
+ /* starting=false, because there is no associated QEMU job */
+ initialize_backup_state_stat(NULL, NULL, total, false);
+
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ backup_state_set_target_id(target_id);
+
+ backup_state.vmaw = NULL;
+ backup_state.pbs = NULL;
+
+ backup_state.di_list = di_list;
+
+ /* Run setup_all_snapshot_access_bh outside of coroutine (in BH) but keep
+ * backup_mutex locked. This is fine, a CoMutex can be held across yield
+ * points, and we'll release it as soon as the BH reschedules us.
+ */
+ CoCtxData waker = {
+ .co = qemu_coroutine_self(),
+ .ctx = qemu_get_current_aio_context(),
+ .data = &local_err,
+ };
+ aio_bh_schedule_oneshot(waker.ctx, setup_all_snapshot_access_bh, &waker);
+ qemu_coroutine_yield();
+
+ if (local_err) {
+ error_propagate(errp, local_err);
+ goto err;
+ }
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+
+ BackupAccessInfoList *bai_head = NULL, **p_bai_next = &bai_head;
+
+ l = di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ BackupAccessInfoList *info = g_malloc0(sizeof(*info));
+ info->value = g_malloc0(sizeof(*info->value));
+ info->value->node_name = g_strdup(bdrv_get_node_name(di->fleecing.snapshot_access));
+ info->value->device = g_strdup(di->device_name);
+ info->value->size = di->size;
+
+ *p_bai_next = info;
+ p_bai_next = &info->next;
+ }
+
+ return bai_head;
+
+err:
+
+ l = di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ g_free(di->device_name);
+ di->device_name = NULL;
+
+ g_free(di);
+ }
+ g_list_free(di_list);
+ backup_state.di_list = NULL;
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ return NULL;
+}
+
+/*
+ * Caller needs to hold the backup mutex or the BQL.
+ */
+void backup_access_teardown(void)
+{
+ GList *l = backup_state.di_list;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.finishing = true;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ if (di->fleecing.snapshot_access) {
+ bdrv_unref(di->fleecing.snapshot_access);
+ di->fleecing.snapshot_access = NULL;
+ }
+ if (di->fleecing.cbw) {
+ bdrv_cbw_drop(di->fleecing.cbw);
+ di->fleecing.cbw = NULL;
+ }
+
+ g_free(di->device_name);
+ di->device_name = NULL;
+
+ g_free(di);
+ }
+ g_list_free(backup_state.di_list);
+ backup_state.di_list = NULL;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.end_time = time(NULL);
+ backup_state.stat.finishing = false;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+}
+
+// Not done in a coroutine, because bdrv_co_unref() and cbw_drop() would just spawn BHs anyways.
+// Caller needs to hold the backup_state.backup_mutex lock
+static void backup_access_teardown_bh(void *opaque)
+{
+ CoCtxData *data = (CoCtxData*)opaque;
+
+ backup_access_teardown();
+
+ /* return */
+ aio_co_enter(data->ctx, data->co);
+}
+
+void coroutine_fn qmp_backup_access_teardown(const char *target_id, Error **errp)
+{
+ assert(qemu_in_coroutine());
+
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
+
+ if (!backup_state.target_id) { // nothing to do
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ return;
+ }
+
+ /*
+ * Continue with target_id == NULL, used by the callback registered for qemu_cleanup()
+ */
+ if (target_id && strcmp(backup_state.target_id, target_id)) {
+ error_setg(errp, "cannot teardown backup access - got target %s instead of %s",
+ target_id, backup_state.target_id);
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ return;
+ }
+
+ if (!strcmp(backup_state.target_id, "Proxmox VE")) {
+ error_setg(errp, "cannot teardown backup access for PVE - use backup-cancel instead");
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ return;
+ }
+
+ CoCtxData waker = {
+ .co = qemu_coroutine_self(),
+ .ctx = qemu_get_current_aio_context(),
+ };
+ aio_bh_schedule_oneshot(waker.ctx, backup_access_teardown_bh, &waker);
+ qemu_coroutine_yield();
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ return;
+}
+
UuidInfo coroutine_fn *qmp_backup(
const char *backup_file,
const char *password,
@@ -1119,7 +1370,7 @@ UuidInfo coroutine_fn *qmp_backup(
}
}
/* initialize global backup_state now */
- initialize_backup_state_stat(backup_file, uuid, total);
+ initialize_backup_state_stat(backup_file, &uuid, total, true);
char *uuid_str = g_strdup(backup_state.stat.uuid_str);
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -1298,5 +1549,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->pbs_masterkey = true;
ret->backup_max_workers = true;
ret->backup_fleecing = true;
+ ret->backup_access_api = true;
return ret;
}
diff --git a/pve-backup.h b/pve-backup.h
new file mode 100644
index 0000000000..4033bc848f
--- /dev/null
+++ b/pve-backup.h
@@ -0,0 +1,16 @@
+/*
+ * Bacup code used by Proxmox VE
+ *
+ * Copyright (C) Proxmox Server Solutions
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef PVE_BACKUP_H
+#define PVE_BACKUP_H
+
+void backup_access_teardown(void);
+
+#endif /* PVE_BACKUP_H */
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c581f1f238..3f092221ce 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1019,6 +1019,9 @@
#
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
#
+# @backup-access-api: Whether backup access API for external providers is
+# supported or not.
+#
# @backup-fleecing: Whether backup fleecing is supported or not.
#
# @backup-max-workers: Whether the 'max-workers' @BackupPerf setting is
@@ -1032,6 +1035,7 @@
'pbs-dirty-bitmap-migration': 'bool',
'pbs-masterkey': 'bool',
'pbs-library-version': 'str',
+ 'backup-access-api': 'bool',
'backup-fleecing': 'bool',
'backup-max-workers': 'bool' } }
@@ -1098,6 +1102,51 @@
##
{ 'command': 'query-pbs-bitmap-info', 'returns': ['PBSBitmapInfo'] }
+##
+# @BackupAccessInfo:
+#
+# Info associated to a snapshot access for backup. For more information about
+# the bitmap see @BackupAccessBitmapMode.
+#
+# @node-name: the block node name of the snapshot-access node.
+#
+# @device: the device on top of which the snapshot access was created.
+#
+# @size: the size of the block device in bytes.
+#
+##
+{ 'struct': 'BackupAccessInfo',
+ 'data': { 'node-name': 'str', 'device': 'str', 'size': 'size' } }
+
+##
+# @backup-access-setup:
+#
+# Set up snapshot access to VM drives for an external backup provider. No other
+# backup or backup access can be done before tearing down the backup access.
+#
+# @target-id: the unique ID of the backup target.
+#
+# @devlist: list of block device names (separated by ',', ';' or ':'). By
+# default the backup includes all writable block devices.
+#
+# Returns: a list of @BackupAccessInfo, one for each device.
+#
+##
+{ 'command': 'backup-access-setup',
+ 'data': { 'target-id': 'str', '*devlist': 'str' },
+ 'returns': [ 'BackupAccessInfo' ], 'coroutine': true }
+
+##
+# @backup-access-teardown:
+#
+# Tear down previously setup snapshot access for the same target.
+#
+# @target-id: the ID of the backup target.
+#
+##
+{ 'command': 'backup-access-teardown', 'data': { 'target-id': 'str' },
+ 'coroutine': true }
+
##
# @BlockDeviceTimedStats:
#
diff --git a/system/runstate.c b/system/runstate.c
index c2c9afa905..6f93d7c2fb 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -60,6 +60,7 @@
#include "sysemu/sysemu.h"
#include "sysemu/tpm.h"
#include "trace.h"
+#include "pve-backup.h"
static NotifierList exit_notifiers =
NOTIFIER_LIST_INITIALIZER(exit_notifiers);
@@ -920,6 +921,11 @@ void qemu_cleanup(int status)
* requests happening from here on anyway.
*/
bdrv_drain_all_begin();
+ /*
+ * The backup access is set up by a QMP command, but is neither owned by a monitor nor
+ * associated to a BlockBackend. Need to tear it down manually here.
+ */
+ backup_access_teardown();
job_cancel_sync_all();
bdrv_close_all();

View File

@@ -0,0 +1,122 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:47 +0200
Subject: [PATCH] PVE backup: factor out get_single_device_info() helper
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[WB: free di and di->device_name on error]
Sigend-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 90 +++++++++++++++++++++++++++++++---------------------
1 file changed, 53 insertions(+), 37 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 091b5bd231..8b7414f057 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -760,6 +760,57 @@ static bool fleecing_all(const char *device_id)
return true;
}
+static PVEBackupDevInfo coroutine_fn GRAPH_RDLOCK *get_single_device_info(
+ const char *device,
+ bool (*device_uses_fleecing)(const char*),
+ Error **errp)
+{
+ BlockBackend *blk = blk_by_name(device);
+ if (!blk) {
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ "Device '%s' not found", device);
+ return NULL;
+ }
+ BlockDriverState *bs = blk_bs(blk);
+ if (!bdrv_co_is_inserted(bs)) {
+ error_setg(errp, "Device '%s' has no medium", device);
+ return NULL;
+ }
+ PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
+ di->bs = bs;
+ di->device_name = g_strdup(bdrv_get_device_name(bs));
+
+ if (device_uses_fleecing && device_uses_fleecing(device)) {
+ g_autofree gchar *fleecing_devid = g_strconcat(device, "-fleecing", NULL);
+ BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
+ if (!fleecing_blk) {
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ "Device '%s' not found", fleecing_devid);
+ goto fail;
+ }
+ BlockDriverState *fleecing_bs = blk_bs(fleecing_blk);
+ if (!bdrv_co_is_inserted(fleecing_bs)) {
+ error_setg(errp, "Device '%s' has no medium", fleecing_devid);
+ goto fail;
+ }
+ /*
+ * Fleecing image needs to be the same size to act as a cbw target.
+ */
+ if (bs->total_sectors != fleecing_bs->total_sectors) {
+ error_setg(errp, "Size mismatch for '%s' - sector count %ld != %ld",
+ fleecing_devid, fleecing_bs->total_sectors, bs->total_sectors);
+ goto fail;
+ }
+ di->fleecing.bs = fleecing_bs;
+ }
+
+ return di;
+fail:
+ g_free(di->device_name);
+ g_free(di);
+ return NULL;
+}
+
/*
* Returns a list of device infos, which needs to be freed by the caller. In
* case of an error, errp will be set, but the returned value might still be a
@@ -778,45 +829,10 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
gchar **d = devs;
while (d && *d) {
- BlockBackend *blk = blk_by_name(*d);
- if (!blk) {
- error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
- "Device '%s' not found", *d);
+ PVEBackupDevInfo *di = get_single_device_info(*d, device_uses_fleecing, errp);
+ if (!di) {
goto err;
}
- BlockDriverState *bs = blk_bs(blk);
- if (!bdrv_co_is_inserted(bs)) {
- error_setg(errp, "Device '%s' has no medium", *d);
- goto err;
- }
- PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
- di->bs = bs;
- di->device_name = g_strdup(bdrv_get_device_name(bs));
-
- if (device_uses_fleecing && device_uses_fleecing(*d)) {
- g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
- BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
- if (!fleecing_blk) {
- error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
- "Device '%s' not found", fleecing_devid);
- goto err;
- }
- BlockDriverState *fleecing_bs = blk_bs(fleecing_blk);
- if (!bdrv_co_is_inserted(fleecing_bs)) {
- error_setg(errp, "Device '%s' has no medium", fleecing_devid);
- goto err;
- }
- /*
- * Fleecing image needs to be the same size to act as a cbw target.
- */
- if (bs->total_sectors != fleecing_bs->total_sectors) {
- error_setg(errp, "Size mismatch for '%s' - sector count %ld != %ld",
- fleecing_devid, fleecing_bs->total_sectors, bs->total_sectors);
- goto err;
- }
- di->fleecing.bs = fleecing_bs;
- }
-
di_list = g_list_append(di_list, di);
d++;
}

View File

@@ -0,0 +1,470 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:48 +0200
Subject: [PATCH] PVE backup: implement bitmap support for external backup
access
There can be one dirty bitmap for each backup target ID for each
device (which are tracked in the backup_access_bitmaps hash table).
The QMP user can specify the ID of the bitmap it likes to use. This ID
is then compared to the current one for the given target and device.
If they match, the bitmap is re-used (should it still exist on the
drive, otherwise re-created). If there is a mismatch, the old bitmap
is removed and a new one is created.
The return value of the QMP command includes information about what
bitmap action was taken. Similar to what the query-backup QMP command
returns for regular backup. It also includes the bitmap name and
associated block node, so the management layer can then set up an NBD
export with the bitmap.
While the backup access is active, a background bitmap is also
required. This is necessary to implement bitmap handling according to
the original reference [0]. In particular:
- in the error case, new writes since the backup access was set up are
in the background bitmap. Because of failure, the previously tracked
writes from the backup access bitmap are still required too. Thus,
the bitmap is merged with the background bitmap to get all new
writes since the last backup.
- in the success case, continue tracking for the next incremental
backup in the backup access bitmap. New writes since the backup
access was set up are in the background bitmap. Because the backup
was successfully, clear the backup access bitmap and merge back the
background bitmap to get only the new writes.
Since QEMU cannot know if the backup was successful or not (except if
failure already happens during the setup QMP command), the management
layer needs to tell it via the teardown QMP command.
The bitmap action is also recorded in the device info now.
[0]: https://lore.kernel.org/qemu-devel/b68833dd-8864-4d72-7c61-c134a9835036@ya.ru/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 196 +++++++++++++++++++++++++++++++++++++++++--
pve-backup.h | 2 +-
qapi/block-core.json | 36 ++++++--
system/runstate.c | 2 +-
4 files changed, 220 insertions(+), 16 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 8b7414f057..0490d1f421 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -15,6 +15,7 @@
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qerror.h"
#include "qemu/cutils.h"
+#include "qemu/error-report.h"
#if defined(CONFIG_MALLOC_TRIM)
#include <malloc.h>
@@ -41,6 +42,7 @@
*/
const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
+const char *BACKGROUND_BITMAP_NAME = "backup-access-background-bitmap";
static struct PVEBackupState {
struct {
@@ -72,6 +74,7 @@ static struct PVEBackupState {
CoMutex backup_mutex;
CoMutex dump_callback_mutex;
char *target_id;
+ GHashTable *backup_access_bitmaps; // key=target_id, value=bitmap_name
} backup_state;
static void pvebackup_init(void)
@@ -99,8 +102,11 @@ typedef struct PVEBackupDevInfo {
char* device_name;
int completed_ret; // INT_MAX if not completed
BdrvDirtyBitmap *bitmap;
+ BdrvDirtyBitmap *background_bitmap; // used for external backup access
+ PBSBitmapAction bitmap_action;
BlockDriverState *target;
BlockJob *job;
+ char *requested_bitmap_name; // used by external backup access during initialization
} PVEBackupDevInfo;
static void pvebackup_propagate_error(Error *err)
@@ -362,6 +368,67 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
+/*
+ * New writes since the backup access was set up are in the background bitmap. Because of failure,
+ * the previously tracked writes in di->bitmap are still required too. Thus, merge with the
+ * background bitmap to get all new writes since the last backup.
+ */
+static void handle_backup_access_bitmaps_in_error_case(PVEBackupDevInfo *di)
+{
+ Error *local_err = NULL;
+
+ if (di->bs && di->background_bitmap) {
+ bdrv_drained_begin(di->bs);
+ if (di->bitmap) {
+ bdrv_enable_dirty_bitmap(di->bitmap);
+ if (!bdrv_merge_dirty_bitmap(di->bitmap, di->background_bitmap, NULL, &local_err)) {
+ warn_report("backup access: %s - could not merge bitmaps in error path - %s",
+ di->device_name,
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ /*
+ * Could not merge, drop original bitmap too.
+ */
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+ } else {
+ warn_report("backup access: %s - expected bitmap not present", di->device_name);
+ }
+ bdrv_release_dirty_bitmap(di->background_bitmap);
+ bdrv_drained_end(di->bs);
+ }
+}
+
+/*
+ * Continue tracking for next incremental backup in di->bitmap. New writes since the backup access
+ * was set up are in the background bitmap. Because the backup was successful, clear di->bitmap and
+ * merge back the background bitmap to get only the new writes.
+ */
+static void handle_backup_access_bitmaps_after_success(PVEBackupDevInfo *di)
+{
+ Error *local_err = NULL;
+
+ if (di->bs && di->background_bitmap) {
+ bdrv_drained_begin(di->bs);
+ if (di->bitmap) {
+ bdrv_enable_dirty_bitmap(di->bitmap);
+ bdrv_clear_dirty_bitmap(di->bitmap, NULL);
+ if (!bdrv_merge_dirty_bitmap(di->bitmap, di->background_bitmap, NULL, &local_err)) {
+ warn_report("backup access: %s - could not merge bitmaps after backup - %s",
+ di->device_name,
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ /*
+ * Could not merge, drop original bitmap too.
+ */
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+ } else {
+ warn_report("backup access: %s - expected bitmap not present", di->device_name);
+ }
+ bdrv_release_dirty_bitmap(di->background_bitmap);
+ bdrv_drained_end(di->bs);
+ }
+}
+
static void cleanup_snapshot_access(PVEBackupDevInfo *di)
{
if (di->fleecing.snapshot_access) {
@@ -605,6 +672,21 @@ static void setup_all_snapshot_access_bh(void *opaque)
bdrv_drained_begin(di->bs);
+ if (di->bitmap) {
+ BdrvDirtyBitmap *background_bitmap =
+ bdrv_create_dirty_bitmap(di->bs, PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE,
+ BACKGROUND_BITMAP_NAME, &local_err);
+ if (!background_bitmap) {
+ error_setg(errp, "%s - creating background bitmap for backup access failed: %s",
+ di->device_name,
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ bdrv_drained_end(di->bs);
+ break;
+ }
+ di->background_bitmap = background_bitmap;
+ bdrv_disable_dirty_bitmap(di->bitmap);
+ }
+
if (setup_snapshot_access(di, &local_err) < 0) {
bdrv_drained_end(di->bs);
error_setg(errp, "%s - setting up snapshot access failed: %s", di->device_name,
@@ -935,7 +1017,7 @@ static void backup_state_set_target_id(const char *target_id) {
BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
const char *target_id,
- const char *devlist,
+ BackupAccessSourceDeviceList *devices,
Error **errp)
{
assert(qemu_in_coroutine());
@@ -954,12 +1036,17 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
}
bdrv_graph_co_rdlock();
- di_list = get_device_info(devlist, fleecing_all, &local_err);
- bdrv_graph_co_rdunlock();
- if (local_err) {
- error_propagate(errp, local_err);
- goto err;
+ for (BackupAccessSourceDeviceList *it = devices; it; it = it->next) {
+ PVEBackupDevInfo *di = get_single_device_info(it->value->device, fleecing_all, &local_err);
+ if (!di) {
+ bdrv_graph_co_rdunlock();
+ error_propagate(errp, local_err);
+ goto err;
+ }
+ di->requested_bitmap_name = g_strdup(it->value->bitmap_name);
+ di_list = g_list_append(di_list, di);
}
+ bdrv_graph_co_rdunlock();
assert(di_list);
size_t total = 0;
@@ -986,6 +1073,78 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
/* clear previous backup's bitmap_list */
clear_backup_state_bitmap_list();
+ if (!backup_state.backup_access_bitmaps) {
+ backup_state.backup_access_bitmaps =
+ g_hash_table_new_full(g_str_hash, g_str_equal, free, free);
+ }
+
+ /* create bitmaps if requested */
+ l = di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ di->block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE;
+
+ PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED;
+ size_t dirty = di->size;
+
+ const char *old_bitmap_name =
+ (const char*)g_hash_table_lookup(backup_state.backup_access_bitmaps, target_id);
+
+ bool same_bitmap_name = old_bitmap_name
+ && di->requested_bitmap_name
+ && strcmp(di->requested_bitmap_name, old_bitmap_name) == 0;
+
+ if (old_bitmap_name && !same_bitmap_name) {
+ BdrvDirtyBitmap *old_bitmap = bdrv_find_dirty_bitmap(di->bs, old_bitmap_name);
+ if (!old_bitmap) {
+ warn_report("setup backup access: expected old bitmap '%s' not found for drive "
+ "'%s'", old_bitmap_name, di->device_name);
+ } else {
+ g_hash_table_remove(backup_state.backup_access_bitmaps, target_id);
+ bdrv_release_dirty_bitmap(old_bitmap);
+ action = PBS_BITMAP_ACTION_NOT_USED_REMOVED;
+ }
+ }
+
+ BdrvDirtyBitmap *bitmap = NULL;
+ if (di->requested_bitmap_name) {
+ bitmap = bdrv_find_dirty_bitmap(di->bs, di->requested_bitmap_name);
+ if (!bitmap) {
+ bitmap = bdrv_create_dirty_bitmap(di->bs, PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE,
+ di->requested_bitmap_name, errp);
+ if (!bitmap) {
+ qemu_mutex_unlock(&backup_state.stat.lock);
+ goto err;
+ }
+ bdrv_set_dirty_bitmap(bitmap, 0, di->size);
+ action = PBS_BITMAP_ACTION_NEW;
+ } else {
+ /* track clean chunks as reused */
+ dirty = MIN(bdrv_get_dirty_count(bitmap), di->size);
+ backup_state.stat.reused += di->size - dirty;
+ action = PBS_BITMAP_ACTION_USED;
+ }
+
+ if (!same_bitmap_name) {
+ g_hash_table_insert(backup_state.backup_access_bitmaps,
+ strdup(target_id), strdup(di->requested_bitmap_name));
+ }
+
+ }
+
+ PBSBitmapInfo *info = g_malloc(sizeof(*info));
+ info->drive = g_strdup(di->device_name);
+ info->action = action;
+ info->size = di->size;
+ info->dirty = dirty;
+ backup_state.stat.bitmap_list = g_list_append(backup_state.stat.bitmap_list, info);
+
+ di->bitmap = bitmap;
+ di->bitmap_action = action;
+ }
+
/* starting=false, because there is no associated QEMU job */
initialize_backup_state_stat(NULL, NULL, total, false);
@@ -1029,6 +1188,12 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
info->value->node_name = g_strdup(bdrv_get_node_name(di->fleecing.snapshot_access));
info->value->device = g_strdup(di->device_name);
info->value->size = di->size;
+ if (di->requested_bitmap_name) {
+ info->value->bitmap_node_name = g_strdup(bdrv_get_node_name(di->bs));
+ info->value->bitmap_name = g_strdup(di->requested_bitmap_name);
+ info->value->bitmap_action = di->bitmap_action;
+ info->value->has_bitmap_action = true;
+ }
*p_bai_next = info;
p_bai_next = &info->next;
@@ -1043,6 +1208,8 @@ err:
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
+ handle_backup_access_bitmaps_in_error_case(di);
+
g_free(di->device_name);
di->device_name = NULL;
@@ -1058,7 +1225,7 @@ err:
/*
* Caller needs to hold the backup mutex or the BQL.
*/
-void backup_access_teardown(void)
+void backup_access_teardown(bool success)
{
GList *l = backup_state.di_list;
@@ -1079,9 +1246,18 @@ void backup_access_teardown(void)
di->fleecing.cbw = NULL;
}
+ if (success) {
+ handle_backup_access_bitmaps_after_success(di);
+ } else {
+ handle_backup_access_bitmaps_in_error_case(di);
+ }
+
g_free(di->device_name);
di->device_name = NULL;
+ g_free(di->requested_bitmap_name);
+ di->requested_bitmap_name = NULL;
+
g_free(di);
}
g_list_free(backup_state.di_list);
@@ -1099,13 +1275,13 @@ static void backup_access_teardown_bh(void *opaque)
{
CoCtxData *data = (CoCtxData*)opaque;
- backup_access_teardown();
+ backup_access_teardown(*((bool*)data->data));
/* return */
aio_co_enter(data->ctx, data->co);
}
-void coroutine_fn qmp_backup_access_teardown(const char *target_id, Error **errp)
+void coroutine_fn qmp_backup_access_teardown(const char *target_id, bool success, Error **errp)
{
assert(qemu_in_coroutine());
@@ -1135,6 +1311,7 @@ void coroutine_fn qmp_backup_access_teardown(const char *target_id, Error **errp
CoCtxData waker = {
.co = qemu_coroutine_self(),
.ctx = qemu_get_current_aio_context(),
+ .data = &success,
};
aio_bh_schedule_oneshot(waker.ctx, backup_access_teardown_bh, &waker);
qemu_coroutine_yield();
@@ -1335,6 +1512,7 @@ UuidInfo coroutine_fn *qmp_backup(
}
di->dev_id = dev_id;
+ di->bitmap_action = action;
PBSBitmapInfo *info = g_malloc(sizeof(*info));
info->drive = g_strdup(di->device_name);
diff --git a/pve-backup.h b/pve-backup.h
index 4033bc848f..9ebeef7c8f 100644
--- a/pve-backup.h
+++ b/pve-backup.h
@@ -11,6 +11,6 @@
#ifndef PVE_BACKUP_H
#define PVE_BACKUP_H
-void backup_access_teardown(void);
+void backup_access_teardown(bool success);
#endif /* PVE_BACKUP_H */
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 3f092221ce..873db3f276 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1114,9 +1114,33 @@
#
# @size: the size of the block device in bytes.
#
+# @bitmap-node-name: the block node name the dirty bitmap is associated to.
+#
+# @bitmap-name: the name of the dirty bitmap associated to the backup access.
+#
+# @bitmap-action: the action taken on the dirty bitmap.
+#
##
{ 'struct': 'BackupAccessInfo',
- 'data': { 'node-name': 'str', 'device': 'str', 'size': 'size' } }
+ 'data': { 'node-name': 'str', 'device': 'str', 'size': 'size',
+ '*bitmap-node-name': 'str', '*bitmap-name': 'str',
+ '*bitmap-action': 'PBSBitmapAction' } }
+
+##
+# @BackupAccessSourceDevice:
+#
+# Source block device information for creating a backup access.
+#
+# @device: the block device name.
+#
+# @bitmap-name: use/create a bitmap with this name for the device. Re-using the
+# same name allows for making incremental backups. Check the @bitmap-action
+# in the result to see if you can actually re-use the bitmap or if it had to
+# be newly created.
+#
+##
+{ 'struct': 'BackupAccessSourceDevice',
+ 'data': { 'device': 'str', '*bitmap-name': 'str' } }
##
# @backup-access-setup:
@@ -1126,14 +1150,13 @@
#
# @target-id: the unique ID of the backup target.
#
-# @devlist: list of block device names (separated by ',', ';' or ':'). By
-# default the backup includes all writable block devices.
+# @devices: list of devices for which to create the backup access.
#
# Returns: a list of @BackupAccessInfo, one for each device.
#
##
{ 'command': 'backup-access-setup',
- 'data': { 'target-id': 'str', '*devlist': 'str' },
+ 'data': { 'target-id': 'str', 'devices': [ 'BackupAccessSourceDevice' ] },
'returns': [ 'BackupAccessInfo' ], 'coroutine': true }
##
@@ -1143,8 +1166,11 @@
#
# @target-id: the ID of the backup target.
#
+# @success: whether the backup done by the external provider was successful.
+#
##
-{ 'command': 'backup-access-teardown', 'data': { 'target-id': 'str' },
+{ 'command': 'backup-access-teardown',
+ 'data': { 'target-id': 'str', 'success': 'bool' },
'coroutine': true }
##
diff --git a/system/runstate.c b/system/runstate.c
index 6f93d7c2fb..ef3277930f 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -925,7 +925,7 @@ void qemu_cleanup(int status)
* The backup access is set up by a QMP command, but is neither owned by a monitor nor
* associated to a BlockBackend. Need to tear it down manually here.
*/
- backup_access_teardown();
+ backup_access_teardown(false);
job_cancel_sync_all();
bdrv_close_all();

View File

@@ -0,0 +1,57 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:49 +0200
Subject: [PATCH] PVE backup: backup-access api: indicate situation where a
bitmap was recreated
The backup-access api keeps track of what bitmap names got used for
which devices and thus knows when a bitmap went missing. Propagate
this information to the QMP user with a new 'missing-recreated'
variant for the taken bitmap action.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 6 +++++-
qapi/block-core.json | 9 ++++++++-
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 0490d1f421..8909842292 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1119,7 +1119,11 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
goto err;
}
bdrv_set_dirty_bitmap(bitmap, 0, di->size);
- action = PBS_BITMAP_ACTION_NEW;
+ if (same_bitmap_name) {
+ action = PBS_BITMAP_ACTION_MISSING_RECREATED;
+ } else {
+ action = PBS_BITMAP_ACTION_NEW;
+ }
} else {
/* track clean chunks as reused */
dirty = MIN(bdrv_get_dirty_count(bitmap), di->size);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 873db3f276..58586170d9 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1067,9 +1067,16 @@
# base snapshot did not match the base given for the current job or
# the crypt mode has changed.
#
+# @missing-recreated: A bitmap for incremental backup was expected to be
+# present, but was missing and thus got recreated. For example, this can
+# happen if the drive was re-attached or if the bitmap was deleted for some
+# other reason. PBS does not currently keep track of this; the backup-access
+# mechanism does.
+#
##
{ 'enum': 'PBSBitmapAction',
- 'data': ['not-used', 'not-used-removed', 'new', 'used', 'invalid'] }
+ 'data': ['not-used', 'not-used-removed', 'new', 'used', 'invalid',
+ 'missing-recreated'] }
##
# @PBSBitmapInfo:

View File

@@ -0,0 +1,84 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Thu, 3 Apr 2025 14:30:50 +0200
Subject: [PATCH] PVE backup: backup-access-api: explicit bitmap-mode parameter
This allows to explicitly request to re-create a bitmap under the same
name.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[TL: fix trailing whitespace error]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
pve-backup.c | 17 ++++++++++++++++-
qapi/block-core.json | 20 +++++++++++++++++++-
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 8909842292..18bcf29533 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1043,7 +1043,16 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
error_propagate(errp, local_err);
goto err;
}
- di->requested_bitmap_name = g_strdup(it->value->bitmap_name);
+ if (it->value->bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_NONE) {
+ di->bitmap_action = PBS_BITMAP_ACTION_NOT_USED;
+ } else {
+ di->requested_bitmap_name = g_strdup(it->value->bitmap_name);
+ if (it->value->bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_NEW) {
+ di->bitmap_action = PBS_BITMAP_ACTION_NEW;
+ } else if (it->value->bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_USE) {
+ di->bitmap_action = PBS_BITMAP_ACTION_USED;
+ }
+ }
di_list = g_list_append(di_list, di);
}
bdrv_graph_co_rdunlock();
@@ -1096,6 +1105,12 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
&& di->requested_bitmap_name
&& strcmp(di->requested_bitmap_name, old_bitmap_name) == 0;
+ /* special case: if we explicitly requested a *new* bitmap, treat an
+ * existing bitmap as having a different name */
+ if (di->bitmap_action == PBS_BITMAP_ACTION_NEW) {
+ same_bitmap_name = false;
+ }
+
if (old_bitmap_name && !same_bitmap_name) {
BdrvDirtyBitmap *old_bitmap = bdrv_find_dirty_bitmap(di->bs, old_bitmap_name);
if (!old_bitmap) {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 58586170d9..09beb3217c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1145,9 +1145,27 @@
# in the result to see if you can actually re-use the bitmap or if it had to
# be newly created.
#
+# @bitmap-mode: used to control whether the bitmap should be reused or
+# recreated.
+#
##
{ 'struct': 'BackupAccessSourceDevice',
- 'data': { 'device': 'str', '*bitmap-name': 'str' } }
+ 'data': { 'device': 'str', '*bitmap-name': 'str',
+ '*bitmap-mode': 'BackupAccessSetupBitmapMode' } }
+
+##
+# @BackupAccessSetupBitmapMode:
+#
+# How to setup a bitmap for a device for @backup-access-setup.
+#
+# @none: do not use a bitmap. Removes an existing bitmap if present.
+#
+# @new: create and use a new bitmap.
+#
+# @use: try to re-use an existing bitmap. Create a new one if it doesn't exist.
+##
+{ 'enum': 'BackupAccessSetupBitmapMode',
+ 'data': ['none', 'new', 'use' ] }
##
# @backup-access-setup:

View File

@@ -0,0 +1,206 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Fri, 4 Apr 2025 15:31:36 +0200
Subject: [PATCH] PVE backup: backup-access api: simplify bitmap logic
Currently, only one bitmap name per target is planned to be used.
Simply use the target ID itself as the bitmap name. This allows to
simplify the logic quite a bit and there also is no need for the
backup_access_bitmaps hash table anymore.
For the return value, the bitmap names are still passed along for
convenience in the caller.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 72 ++++++++++++--------------------------------
qapi/block-core.json | 15 ++++-----
2 files changed, 26 insertions(+), 61 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 18bcf29533..0ea0343b22 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -74,7 +74,6 @@ static struct PVEBackupState {
CoMutex backup_mutex;
CoMutex dump_callback_mutex;
char *target_id;
- GHashTable *backup_access_bitmaps; // key=target_id, value=bitmap_name
} backup_state;
static void pvebackup_init(void)
@@ -106,7 +105,7 @@ typedef struct PVEBackupDevInfo {
PBSBitmapAction bitmap_action;
BlockDriverState *target;
BlockJob *job;
- char *requested_bitmap_name; // used by external backup access during initialization
+ BackupAccessSetupBitmapMode requested_bitmap_mode;
} PVEBackupDevInfo;
static void pvebackup_propagate_error(Error *err)
@@ -1043,16 +1042,7 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
error_propagate(errp, local_err);
goto err;
}
- if (it->value->bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_NONE) {
- di->bitmap_action = PBS_BITMAP_ACTION_NOT_USED;
- } else {
- di->requested_bitmap_name = g_strdup(it->value->bitmap_name);
- if (it->value->bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_NEW) {
- di->bitmap_action = PBS_BITMAP_ACTION_NEW;
- } else if (it->value->bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_USE) {
- di->bitmap_action = PBS_BITMAP_ACTION_USED;
- }
- }
+ di->requested_bitmap_mode = it->value->bitmap_mode;
di_list = g_list_append(di_list, di);
}
bdrv_graph_co_rdunlock();
@@ -1082,10 +1072,7 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
/* clear previous backup's bitmap_list */
clear_backup_state_bitmap_list();
- if (!backup_state.backup_access_bitmaps) {
- backup_state.backup_access_bitmaps =
- g_hash_table_new_full(g_str_hash, g_str_equal, free, free);
- }
+ const char *bitmap_name = target_id;
/* create bitmaps if requested */
l = di_list;
@@ -1098,59 +1085,43 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED;
size_t dirty = di->size;
- const char *old_bitmap_name =
- (const char*)g_hash_table_lookup(backup_state.backup_access_bitmaps, target_id);
-
- bool same_bitmap_name = old_bitmap_name
- && di->requested_bitmap_name
- && strcmp(di->requested_bitmap_name, old_bitmap_name) == 0;
-
- /* special case: if we explicitly requested a *new* bitmap, treat an
- * existing bitmap as having a different name */
- if (di->bitmap_action == PBS_BITMAP_ACTION_NEW) {
- same_bitmap_name = false;
- }
-
- if (old_bitmap_name && !same_bitmap_name) {
- BdrvDirtyBitmap *old_bitmap = bdrv_find_dirty_bitmap(di->bs, old_bitmap_name);
- if (!old_bitmap) {
- warn_report("setup backup access: expected old bitmap '%s' not found for drive "
- "'%s'", old_bitmap_name, di->device_name);
- } else {
- g_hash_table_remove(backup_state.backup_access_bitmaps, target_id);
+ if (di->requested_bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_NONE ||
+ di->requested_bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_NEW) {
+ BdrvDirtyBitmap *old_bitmap = bdrv_find_dirty_bitmap(di->bs, bitmap_name);
+ if (old_bitmap) {
bdrv_release_dirty_bitmap(old_bitmap);
- action = PBS_BITMAP_ACTION_NOT_USED_REMOVED;
+ action = PBS_BITMAP_ACTION_NOT_USED_REMOVED; // set below for new
}
}
BdrvDirtyBitmap *bitmap = NULL;
- if (di->requested_bitmap_name) {
- bitmap = bdrv_find_dirty_bitmap(di->bs, di->requested_bitmap_name);
+ if (di->requested_bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_NEW ||
+ di->requested_bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_USE) {
+ bitmap = bdrv_find_dirty_bitmap(di->bs, bitmap_name);
if (!bitmap) {
bitmap = bdrv_create_dirty_bitmap(di->bs, PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE,
- di->requested_bitmap_name, errp);
+ bitmap_name, errp);
if (!bitmap) {
qemu_mutex_unlock(&backup_state.stat.lock);
goto err;
}
bdrv_set_dirty_bitmap(bitmap, 0, di->size);
- if (same_bitmap_name) {
+ if (di->requested_bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_USE) {
action = PBS_BITMAP_ACTION_MISSING_RECREATED;
} else {
action = PBS_BITMAP_ACTION_NEW;
}
} else {
+ if (di->requested_bitmap_mode == BACKUP_ACCESS_SETUP_BITMAP_MODE_NEW) {
+ qemu_mutex_unlock(&backup_state.stat.lock);
+ error_setg(errp, "internal error - removed old bitmap still present");
+ goto err;
+ }
/* track clean chunks as reused */
dirty = MIN(bdrv_get_dirty_count(bitmap), di->size);
backup_state.stat.reused += di->size - dirty;
action = PBS_BITMAP_ACTION_USED;
}
-
- if (!same_bitmap_name) {
- g_hash_table_insert(backup_state.backup_access_bitmaps,
- strdup(target_id), strdup(di->requested_bitmap_name));
- }
-
}
PBSBitmapInfo *info = g_malloc(sizeof(*info));
@@ -1207,9 +1178,9 @@ BackupAccessInfoList *coroutine_fn qmp_backup_access_setup(
info->value->node_name = g_strdup(bdrv_get_node_name(di->fleecing.snapshot_access));
info->value->device = g_strdup(di->device_name);
info->value->size = di->size;
- if (di->requested_bitmap_name) {
+ if (di->bitmap) {
info->value->bitmap_node_name = g_strdup(bdrv_get_node_name(di->bs));
- info->value->bitmap_name = g_strdup(di->requested_bitmap_name);
+ info->value->bitmap_name = g_strdup(bitmap_name);
info->value->bitmap_action = di->bitmap_action;
info->value->has_bitmap_action = true;
}
@@ -1274,9 +1245,6 @@ void backup_access_teardown(bool success)
g_free(di->device_name);
di->device_name = NULL;
- g_free(di->requested_bitmap_name);
- di->requested_bitmap_name = NULL;
-
g_free(di);
}
g_list_free(backup_state.di_list);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 09beb3217c..02c043f0f7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1140,18 +1140,12 @@
#
# @device: the block device name.
#
-# @bitmap-name: use/create a bitmap with this name for the device. Re-using the
-# same name allows for making incremental backups. Check the @bitmap-action
-# in the result to see if you can actually re-use the bitmap or if it had to
-# be newly created.
-#
# @bitmap-mode: used to control whether the bitmap should be reused or
-# recreated.
+# recreated or not used. Default is not using a bitmap.
#
##
{ 'struct': 'BackupAccessSourceDevice',
- 'data': { 'device': 'str', '*bitmap-name': 'str',
- '*bitmap-mode': 'BackupAccessSetupBitmapMode' } }
+ 'data': { 'device': 'str', '*bitmap-mode': 'BackupAccessSetupBitmapMode' } }
##
# @BackupAccessSetupBitmapMode:
@@ -1175,7 +1169,10 @@
#
# @target-id: the unique ID of the backup target.
#
-# @devices: list of devices for which to create the backup access.
+# @devices: list of devices for which to create the backup access. Also
+# controls whether to use/create a bitmap for the device. Check the
+# @bitmap-action in the result to see what action was actually taken for the
+# bitmap. Each target controls its own bitmaps.
#
# Returns: a list of @BackupAccessInfo, one for each device.
#

View File

@@ -0,0 +1,30 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 2 Jun 2025 12:56:16 +0200
Subject: [PATCH] block/gluster: provide more context for deprecation warning
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Acked-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/gluster.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block/gluster.c b/block/gluster.c
index c3a9555591..71c4a60b02 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -816,7 +816,11 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
goto out;
}
- warn_report_once("'gluster' is deprecated");
+ warn_report_once("Native 'gluster' support will be deprecated and removed"
+ " in Proxmox VE 9.");
+ warn_report_once("Before upgrading, switch to accessing your Gluster"
+ " storage as a directory storage or migrate to another"
+ " storage type.");
filename = qemu_opt_get(opts, GLUSTER_OPT_FILENAME);

View File

@@ -0,0 +1,150 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 18 Jun 2025 12:25:31 +0200
Subject: [PATCH] savevm-async: reuse migration blocker check for
snapshots/hibernation
Same rationale as with upstream QEMU commit 5aaac46793 ("migration:
savevm: consult migration blockers"), migration and (async) snapshot
are essentially the same operation and thus snapshot also needs to
check for migration blockers. For example, this catches passed-through
PCI devices, where the driver does not support migration and VirtIO-GL
display, which also does not support migration yet.
In the case of VirtIO-GL, there were crashes [0].
However, the commit notes:
> There is really no difference between live migration and savevm, except
> that savevm does not require bdrv_invalidate_cache to be implemented
> by all disks. However, it is unlikely that savevm is used with anything
> except qcow2 disks, so the penalty is small and worth the improvement
> in catching bad usage of savevm.
and for Proxmox VE, suspend-to-disk with VMDK does use savevm-async
and would be broken by simply using migration_is_blocked(). To keep
this working, introduce a new helper that filters blockers with the
prefix used by the VMDK migration blocker.
The function qemu_savevm_state_blocked() is called as part of
savevm_async_is_blocked() so no check is lost with this
patch. The helper is declared in migration/migration.c to be able to
access the 'migration_blockers'.
The VMDK blocker message is declared via a '#define', because using a
'const char*' led to the linker to complain about multiple
declarations. The message does not include the reference to the block
node anymore, but users can still easily find a VMDK disk in the VM
configuration.
Note, this also "breaks" snapshot and hibernate with VNC clipboard by
preventing it. Previously, this would "work", because the Proxmox VE
API has no check yet, but the clipboard will be broken after rollback,
in the sense that it cannot be used anymore, not just lost contents.
So some users might consider adding the check here a breaking change
even if it's technically correct to prevent snapshot and hibernate
with VNC clipboard. But other users might rightfully complain about
broken clipboard. And again, the check also prevents blockers from
passed-through PCI devices, etc. so it seems worth tolerating that
breakage.
[0]: https://forum.proxmox.com/threads/136976/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250618102531.57444-1-f.ebner@proxmox.com
---
block/vmdk.c | 4 +---
include/migration/blocker.h | 2 ++
migration/migration.c | 24 ++++++++++++++++++++++++
migration/migration.h | 1 +
migration/savevm-async.c | 2 +-
5 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/block/vmdk.c b/block/vmdk.c
index 78f6433607..f15cdaf0e7 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1402,9 +1402,7 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, int flags,
qemu_co_mutex_init(&s->lock);
/* Disable migration when VMDK images are used */
- error_setg(&s->migration_blocker, "The vmdk format used by node '%s' "
- "does not support live migration",
- bdrv_get_device_or_node_name(bs));
+ error_setg(&s->migration_blocker, "%s", MIGRATION_BLOCKER_VMDK);
ret = migrate_add_blocker_normal(&s->migration_blocker, errp);
if (ret < 0) {
goto fail;
diff --git a/include/migration/blocker.h b/include/migration/blocker.h
index a687ac0efe..f36bfb2df1 100644
--- a/include/migration/blocker.h
+++ b/include/migration/blocker.h
@@ -18,6 +18,8 @@
#define MIG_MODE_ALL MIG_MODE__MAX
+#define MIGRATION_BLOCKER_VMDK "The vmdk format used by a disk does not support live migration"
+
/**
* @migrate_add_blocker - prevent all modes of migration from proceeding
*
diff --git a/migration/migration.c b/migration/migration.c
index 491d9aa017..0b77e2a611 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1940,6 +1940,30 @@ bool migration_is_blocked(Error **errp)
return false;
}
+bool savevm_async_is_blocked(Error **errp)
+{
+ GSList *blockers = migration_blockers[migrate_mode()];
+
+ if (qemu_savevm_state_blocked(errp)) {
+ return true;
+ }
+
+ /*
+ * The limitation for VMDK images only applies to live-migration, not
+ * snapshots, see commit 5aaac46793 ("migration: savevm: consult migration
+ * blockers").
+ */
+ while (blockers) {
+ if (strcmp(error_get_pretty(blockers->data), MIGRATION_BLOCKER_VMDK)) {
+ error_propagate(errp, error_copy(blockers->data));
+ return true;
+ }
+ blockers = g_slist_next(blockers);
+ }
+
+ return false;
+}
+
/* Returns true if continue to migrate, or false if error detected */
static bool migrate_prepare(MigrationState *s, bool resume, Error **errp)
{
diff --git a/migration/migration.h b/migration/migration.h
index 3857905c0e..38e315f03f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -492,6 +492,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type,
int migrate_init(MigrationState *s, Error **errp);
bool migration_is_blocked(Error **errp);
+bool savevm_async_is_blocked(Error **errp);
/* True if outgoing migration has entered postcopy phase */
bool migration_in_postcopy(void);
bool migration_postcopy_is_alive(MigrationStatus state);
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index f2b10b5519..bfedae9692 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -375,7 +375,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
return;
}
- if (qemu_savevm_state_blocked(errp)) {
+ if (savevm_async_is_blocked(errp)) {
goto fail;
}

26
debian/patches/series vendored
View File

@@ -65,3 +65,29 @@ pve/0047-PVE-backup-factor-out-setting-up-snapshot-access-for.patch
pve/0048-PVE-backup-save-device-name-in-device-info-structure.patch
pve/0049-PVE-backup-include-device-name-in-error-when-setting.patch
pve/0050-adapt-machine-version-deprecation-for-Proxmox-VE.patch
pve/0051-Revert-hpet-avoid-timer-storms-on-periodic-timers.patch
pve/0052-Revert-hpet-store-full-64-bit-target-value-of-the-co.patch
pve/0053-Revert-hpet-accept-64-bit-reads-and-writes.patch
pve/0054-Revert-hpet-place-read-only-bits-directly-in-new_val.patch
pve/0055-Revert-hpet-remove-unnecessary-variable-index.patch
pve/0056-Revert-hpet-ignore-high-bits-of-comparator-in-32-bit.patch
pve/0057-Revert-hpet-fix-and-cleanup-persistence-of-interrupt.patch
pve/0058-savevm-async-improve-setting-state-of-snapshot-opera.patch
pve/0059-savevm-async-rename-saved_vm_running-to-vm_needs_sta.patch
pve/0060-savevm-async-improve-runstate-preservation-cleanup-e.patch
pve/0061-savevm-async-use-dedicated-iothread-for-state-file.patch
pve/0062-savevm-async-treat-failure-to-set-iothread-context-a.patch
pve/0063-PVE-backup-clean-up-directly-in-setup_snapshot_acces.patch
pve/0064-PVE-backup-factor-out-helper-to-clear-backup-state-s.patch
pve/0065-PVE-backup-factor-out-helper-to-initialize-backup-st.patch
pve/0066-PVE-backup-add-target-ID-in-backup-state.patch
pve/0067-PVE-backup-get-device-info-allow-caller-to-specify-f.patch
pve/0068-PVE-backup-implement-backup-access-setup-and-teardow.patch
pve/0069-PVE-backup-factor-out-get_single_device_info-helper.patch
pve/0070-PVE-backup-implement-bitmap-support-for-external-bac.patch
pve/0071-PVE-backup-backup-access-api-indicate-situation-wher.patch
pve/0072-PVE-backup-backup-access-api-explicit-bitmap-mode-pa.patch
pve/0073-PVE-backup-backup-access-api-simplify-bitmap-logic.patch
pve/0074-block-gluster-provide-more-context-for-deprecation-w.patch
pve/0075-savevm-async-reuse-migration-blocker-check-for-snaps.patch
pve-qemu-9.2-vitastor.patch

3
debian/rules vendored
View File

@@ -131,8 +131,7 @@ binary-indep: build install
binary-arch: build install
dh_testdir
dh_testroot
# exclude historic Changelog file, which stops at release 0.14
dh_installchangelogs --exclude=Changelog
dh_installchangelogs
dh_installdocs
dh_installexamples
dh_install