Compare commits

..

13 Commits

Author SHA1 Message Date
Thomas Lamprecht
f55fd376da bump version to 9.0.2-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-01-24 14:59:21 +01:00
Fiona Ebner
9b31b9a603 vma: allow specifying disk formats for create operation
Previously, the format would be auto-detected which can lead to
confusing raw images as a different format and being unable to detect
corrupted images of non-raw formats.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-01-24 14:57:03 +01:00
Fiona Ebner
3f36e3b77c async snapshot: explicitly specify raw format when loading the VM state file
Proxmox VE always writes state files in raw format.

Suggested-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-01-24 14:57:03 +01:00
Thomas Lamprecht
d1386d3f15 Merge commit '4bc8223' into stable-8
To update to 9.0.2-4

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-01-24 14:56:02 +01:00
Thomas Lamprecht
4bc8223ac9 bump version to 9.0.2-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-11-10 11:23:14 +01:00
Fiona Ebner
fd53092e9b async snapshot: stop vCPU throttling after finishing
In the community forum, users reported issues about RCU stalls and
sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario
was also experiencing similar issues from time to time and recently,
obtained a GDB stacktrace. The stacktrace showed that, in his case,
the vCPU threads were waiting in cpu_throttle_thread(). It is a good
guess that the issues in the forum could also be because of that.

From searching in the source code, it seems that migration is the only
user of the vCPU throttling functions in QEMU relevant for Proxmox VE
(the only other place where it is used is the Cocoa UI). In
particular, RAM migration will begin throttling vCPUs for
auto-converge.

In migration_iteration_finish() there is an unconditional call to
cpu_throttle_stop(), so do the same in the async snapshot code
specific to Proxmox VE.

It's not clear why the issue began to surface more prominently only
now, since the vCPU throttling was there since commit 070afca258
("migration: Dynamic cpu throttling for auto-converge") in QEMU
v2.10.0. However, there were a lot of changes in the migration code
between v8.1.5 and v9.0.2 and a few of them might have affected the
likelihood of cpu_throttle_set() being called, for example, 4e1871c450
("migration: Don't serialize devices in qemu_savevm_state_iterate()")

[0]: https://forum.proxmox.com/threads/153483

Reported-by: Mario Loderer <m.loderer@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Mario Loderer <m.loderer@proxmox.com>
2024-11-10 11:20:39 +01:00
Thomas Lamprecht
7446610389 bump version to 9.0.2-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-09-06 16:22:13 +02:00
Fiona Ebner
903a63402e pick up stable fixes for 9.0
Includes fixes for VirtIO-net, ARM and x86(_64) emulation, CVEs to
harden NBD server against malicious clients, as well as a few others
(VNC, physmem, Intel IOMMU, ...).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-09-05 14:44:54 +02:00
Fiona Ebner
441072fc57 pick up fix for VirtIO PCI regressions
Commit f06b222 ("fixes for QEMU 9.0") included a revert for the QEMU
commit 2ce6cff94d ("virtio-pci: fix use of a released vector"). That
commit caused some regressions which sounded just as bad as the fix.
Those regressions have now been addressed upstream, so pick up the fix
and drop the revert. Dropping the revert fixes the original issue that
commit 2ce6cff94d ("virtio-pci: fix use of a released vector")
addressed.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-09-05 14:44:54 +02:00
Fiona Ebner
582fd47901 bump version to 9.0.2-2
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-08-07 10:17:15 +02:00
Fiona Ebner
356bc2483a actually bump submodule to v9.0.2
Fixes: cf40e92 ("update submodule and patches to QEMU 9.0.2")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-08-07 09:43:16 +02:00
Thomas Lamprecht
e62423e615 bump version to 8.1.5-6
note: this was branched-off the last 8.1 release to provide a fix for
current QEMU on the enterprise repo, for future Proxmox VE 8 stable
updates one can just merge this in the latest commit from master that
is still applicable for Proxmox VE 8.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-29 16:39:52 +02:00
Fiona Ebner
c93a5bfd4f fix #5409: backup: fix copy-before-write timeout
The type for the copy-before-write timeout in nanoseconds was wrong.
By being just uint32_t, a maximum of slightly over 4 seconds was
possible. Larger values would overflow and thus the 45 seconds set by
Proxmox's backup with fleecing, resulted in effectively 2 seconds
timeout for copy-before-write operations.

Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-29 16:38:25 +02:00
44 changed files with 2125 additions and 1407 deletions

View File

@@ -58,7 +58,7 @@ $(BUILDDIR): submodule
deb kvm: $(DEBS)
$(DEB_DBG): $(DEB)
$(DEB): $(BUILDDIR)
cd $(BUILDDIR); dpkg-buildpackage -b -us -uc -j32
cd $(BUILDDIR); dpkg-buildpackage -b -us -uc
lintian $(DEBS)
sbuild: $(DSC)

40
debian/changelog vendored
View File

@@ -1,8 +1,36 @@
pve-qemu-kvm (9.0.2-1+vitastor1) bookworm; urgency=medium
pve-qemu-kvm (9.0.2-5) bookworm; urgency=medium
* Add Vitastor support
* async snapshot: explicitly specify raw format when loading the VM state
file
-- Vitaliy Filippov <vitalif@yourcmc.ru> Fri, 02 Aug 2024 00:03:15 +0300
* vma create: rework CLI parameters for passing disk to a more structured
style and use that to allow explicitly specifying the format
-- Proxmox Support Team <support@proxmox.com> Fri, 24 Jan 2025 14:59:17 +0100
pve-qemu-kvm (9.0.2-4) bookworm; urgency=medium
* async snapshot: ensure any dynamic vCPU-throttling applied for
auto-converge gets always disabled again after finishing the snapshot.
-- Proxmox Support Team <support@proxmox.com> Sun, 10 Nov 2024 11:23:09 +0100
pve-qemu-kvm (9.0.2-3) bookworm; urgency=medium
* pick up fix for VirtIO PCI regressions
* pick up stable fixes for 9.0, including fixes for VirtIO-net, ARM and
x86(_64) emulation, CVEs to harden NBD server against malicious clients,
as well as a few others (VNC, physmem, Intel IOMMU, ...).
-- Proxmox Support Team <support@proxmox.com> Fri, 06 Sep 2024 16:21:42 +0200
pve-qemu-kvm (9.0.2-2) bookworm; urgency=medium
* actually update submodule to QEMU 9.0.2. The previous release was still
based on 9.0.0 by mistake.
-- Proxmox Support Team <support@proxmox.com> Wed, 07 Aug 2024 10:16:01 +0200
pve-qemu-kvm (9.0.2-1) bookworm; urgency=medium
@@ -79,6 +107,12 @@ pve-qemu-kvm (9.0.0-2) bookworm; urgency=medium
-- Proxmox Support Team <support@proxmox.com> Fri, 17 May 2024 17:04:52 +0200
pve-qemu-kvm (8.1.5-6) bookworm; urgency=medium
* fix #5409: backup: fix copy-before-write timeout
-- Proxmox Support Team <support@proxmox.com> Mon, 29 Apr 2024 16:39:38 +0200
pve-qemu-kvm (9.0.0-1) bookworm; urgency=medium
* update submodule and patches to QEMU 9.0.0

1
debian/control vendored
View File

@@ -59,7 +59,6 @@ Depends: ceph-common (>= 0.48),
libspice-server1 (>= 0.14.0~),
libusb-1.0-0 (>= 1.0.17-1),
libusbredirparser1 (>= 0.6-2),
vitastor-client (>= 0.9.4),
libuuid1,
${misc:Depends},
${shlibs:Depends},

View File

@@ -1,87 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 16 May 2024 12:59:52 +0200
Subject: [PATCH] Revert "virtio-pci: fix use of a released vector"
This reverts commit 2ce6cff94df2650c460f809e5ad263f1d22507c0.
The fix causes some issues:
https://gitlab.com/qemu-project/qemu/-/issues/2321
https://gitlab.com/qemu-project/qemu/-/issues/2334
The CVE fixed by commit 2ce6cff94d ("virtio-pci: fix use of a released
vector") is CVE-2024-4693 [0] and allows a malicious guest that
controls the boot process in the guest to crash its QEMU process.
The issues sound worse than the CVE, so revert until there is a proper
fix.
[0]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-4693
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/virtio/virtio-pci.c | 37 ++-----------------------------------
1 file changed, 2 insertions(+), 35 deletions(-)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index e04218a9fb..fd66713848 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1410,38 +1410,6 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
return offset;
}
-static void virtio_pci_set_vector(VirtIODevice *vdev,
- VirtIOPCIProxy *proxy,
- int queue_no, uint16_t old_vector,
- uint16_t new_vector)
-{
- bool kvm_irqfd = (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
- msix_enabled(&proxy->pci_dev) && kvm_msi_via_irqfd_enabled();
-
- if (new_vector == old_vector) {
- return;
- }
-
- /*
- * If the device uses irqfd and the vector changes after DRIVER_OK is
- * set, we need to release the old vector and set up the new one.
- * Otherwise just need to set the new vector on the device.
- */
- if (kvm_irqfd && old_vector != VIRTIO_NO_VECTOR) {
- kvm_virtio_pci_vector_release_one(proxy, queue_no);
- }
- /* Set the new vector on the device. */
- if (queue_no == VIRTIO_CONFIG_IRQ_IDX) {
- vdev->config_vector = new_vector;
- } else {
- virtio_queue_set_vector(vdev, queue_no, new_vector);
- }
- /* If the new vector changed need to set it up. */
- if (kvm_irqfd && new_vector != VIRTIO_NO_VECTOR) {
- kvm_virtio_pci_vector_use_one(proxy, queue_no);
- }
-}
-
int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
uint8_t bar, uint64_t offset, uint64_t length,
uint8_t id)
@@ -1588,8 +1556,7 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
} else {
val = VIRTIO_NO_VECTOR;
}
- virtio_pci_set_vector(vdev, proxy, VIRTIO_CONFIG_IRQ_IDX,
- vdev->config_vector, val);
+ vdev->config_vector = val;
break;
case VIRTIO_PCI_COMMON_STATUS:
if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
@@ -1629,7 +1596,7 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
} else {
val = VIRTIO_NO_VECTOR;
}
- virtio_pci_set_vector(vdev, proxy, vdev->queue_sel, vector, val);
+ virtio_queue_set_vector(vdev, vdev->queue_sel, val);
break;
case VIRTIO_PCI_COMMON_Q_ENABLE:
if (val == 1) {

View File

@@ -0,0 +1,35 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 29 Apr 2024 15:41:11 +0200
Subject: [PATCH] block/copy-before-write: use uint64_t for timeout in
nanoseconds
rather than the uint32_t for which the maximum is slightly more than 4
seconds and larger values would overflow. The QAPI interface allows
specifying the number of seconds, so only values 0 to 4 are safe right
now, other values lead to a much lower timeout than a user expects.
The block_copy() call where this is used already takes a uint64_t for
the timeout, so no change required there.
Fixes: 6db7fd1ca9 ("block/copy-before-write: implement cbw-timeout option")
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
---
block/copy-before-write.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index b866e42271..3ee95c0e7a 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -43,7 +43,7 @@ typedef struct BDRVCopyBeforeWriteState {
BlockCopyState *bcs;
BdrvChild *target;
OnCbwError on_cbw_error;
- uint32_t cbw_timeout_ns;
+ uint64_t cbw_timeout_ns;
/*
* @lock: protects access to @access_bitmap, @done_bitmap and

View File

@@ -0,0 +1,77 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Cindy Lu <lulu@redhat.com>
Date: Tue, 6 Aug 2024 17:37:12 +0800
Subject: [PATCH] virtio-pci: Fix the use of an uninitialized irqfd
The crash was reported in MAC OS and NixOS, here is the link for this bug
https://gitlab.com/qemu-project/qemu/-/issues/2334
https://gitlab.com/qemu-project/qemu/-/issues/2321
In this bug, they are using the virtio_input device. The guest notifier was
not supported for this device, The function virtio_pci_set_guest_notifiers()
was not called, and the vector_irqfd was not initialized.
So the fix is adding the check for vector_irqfd in virtio_pci_get_notifier()
The function virtio_pci_get_notifier() can be used in various devices.
It could also be called when VIRTIO_CONFIG_S_DRIVER_OK is not set. In this situation,
the vector_irqfd being NULL is acceptable. We can allow the device continue to boot
If the vector_irqfd still hasn't been initialized after VIRTIO_CONFIG_S_DRIVER_OK
is set, it means that the function set_guest_notifiers was not called before the
driver started. This indicates that the device is not using the notifier.
At this point, we will let the check fail.
This fix is verified in vyatta,MacOS,NixOS,fedora system.
The bt tree for this bug is:
Thread 6 "CPU 0/KVM" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7c817be006c0 (LWP 1269146)]
kvm_virtio_pci_vq_vector_use () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:817
817 if (irqfd->users == 0) {
(gdb) thread apply all bt
...
Thread 6 (Thread 0x7c817be006c0 (LWP 1269146) "CPU 0/KVM"):
0 kvm_virtio_pci_vq_vector_use () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:817
1 kvm_virtio_pci_vector_use_one () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:893
2 0x00005983657045e2 in memory_region_write_accessor () at ../qemu-9.0.0/system/memory.c:497
3 0x0000598365704ba6 in access_with_adjusted_size () at ../qemu-9.0.0/system/memory.c:573
4 0x0000598365705059 in memory_region_dispatch_write () at ../qemu-9.0.0/system/memory.c:1528
5 0x00005983659b8e1f in flatview_write_continue_step.isra.0 () at ../qemu-9.0.0/system/physmem.c:2713
6 0x000059836570ba7d in flatview_write_continue () at ../qemu-9.0.0/system/physmem.c:2743
7 flatview_write () at ../qemu-9.0.0/system/physmem.c:2774
8 0x000059836570bb76 in address_space_write () at ../qemu-9.0.0/system/physmem.c:2894
9 0x0000598365763afe in address_space_rw () at ../qemu-9.0.0/system/physmem.c:2904
10 kvm_cpu_exec () at ../qemu-9.0.0/accel/kvm/kvm-all.c:2917
11 0x000059836576656e in kvm_vcpu_thread_fn () at ../qemu-9.0.0/accel/kvm/kvm-accel-ops.c:50
12 0x0000598365926ca8 in qemu_thread_start () at ../qemu-9.0.0/util/qemu-thread-posix.c:541
13 0x00007c8185bcd1cf in ??? () at /usr/lib/libc.so.6
14 0x00007c8185c4e504 in clone () at /usr/lib/libc.so.6
Fixes: 2ce6cff94d ("virtio-pci: fix use of a released vector")
Cc: qemu-stable@nongnu.org
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20240806093715.65105-1-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit a8e63ff289d137197ad7a701a587cc432872d798)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/virtio/virtio-pci.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index e04218a9fb..389bab003f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -860,6 +860,9 @@ static int virtio_pci_get_notifier(VirtIOPCIProxy *proxy, int queue_no,
VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
VirtQueue *vq;
+ if (!proxy->vector_irqfd && vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)
+ return -1;
+
if (queue_no == VIRTIO_CONFIG_IRQ_IDX) {
*n = virtio_config_get_guest_notifier(vdev);
*vector = vdev->config_vector;

View File

@@ -0,0 +1,35 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Akihiko Odaki <akihiko.odaki@daynix.com>
Date: Mon, 1 Jul 2024 20:58:04 +0900
Subject: [PATCH] virtio-net: Ensure queue index fits with RSS
Ensure the queue index points to a valid queue when software RSS
enabled. The new calculation matches with the behavior of Linux's TAP
device with the RSS eBPF program.
Fixes: 4474e37a5b3a ("virtio-net: implement RX RSS processing")
Reported-by: Zhibin Hu <huzhibin5@huawei.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit f1595ceb9aad36a6c1da95bcb77ab9509b38822d)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/net/virtio-net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 3644bfd91b..f48588638d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1949,7 +1949,8 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
int index = virtio_net_process_rss(nc, buf, size);
if (index >= 0) {
- NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
+ NetClientState *nc2 =
+ qemu_get_subqueue(n->nic, index % n->curr_queue_pairs);
return virtio_net_receive_rcu(nc2, buf, size, true);
}
}

View File

@@ -0,0 +1,338 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: thomas <east.moutain.yang@gmail.com>
Date: Fri, 12 Jul 2024 11:10:53 +0800
Subject: [PATCH] virtio-net: Fix network stall at the host side waiting for
kick
Patch 06b12970174 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.
The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.
Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
to int, it returns an opaque that represents the shadow_avail_idx
of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
it takes an opaque as input arg which is returned from
virtqueue_get_avail_bytes(). It enables notification firstly,
then checks whether the guest has added some buffers since
last check of available buffers or not by virtio_queue_poll(),
return ture if yes.
The patch also reverts patch "06b12970174".
The case below can reproduce the stall.
Guest 0
+--------+
| iperf |
---------------> | server |
Host | +--------+
+--------+ | ...
| iperf |----
| client |---- Guest n
+--------+ | +--------+
| | iperf |
---------------> | server |
+--------+
Boot many guests from qemu with virtio network:
qemu ... -netdev tap,id=net_x \
-device virtio-net-pci-non-transitional,\
iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
Each guest acts as iperf server with commands below:
iperf3 -s -D -i 10 -p 8001
iperf3 -s -D -i 10 -p 8002
The host as iperf client:
iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.
It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.
receive packets ---
| |
v |
free buf virtnet_poll
| |
v |
add buf to avail ring ---
|
| need kick the host?
| NAPI continues
v
receive packets ---
| |
v |
free buf virtnet_poll
| |
v |
add buf to avail ring ---
|
v
... ...
On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.
Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.
The host may stall forever at the sequences below:
Host Guest
------------ -----------
fetch buf, send packet receive packet ---
... ... |
fetch buf, send packet add buf |
... add buf virtnet_poll
buf not enough avail idx-> add buf |
read avail idx add buf |
add buf ---
receive packet ---
write event idx ... |
wait for kick add buf virtnet_poll
... |
---
no more packet, exit NAPI
In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
step 1: The buf is not enough, for example, a big packet
needs 5 buf, but the available buf count is 3.
The host read current avail idx.
step 2: The guest adds some buf, then checks whether the
host is waiting for kick signal, not at this time.
The used ring is not empty, the guest continues
the second loop of NAPI.
step 3: The host writes the avail idx read from avail
ring to used ring as event idx via
virtio_queue_set_notification(q->rx_vq, 1).
step 4: At the end of the second loop of NAPI, recheck
whether kick is needed, as the event idx in the
used ring written by the host is beyound the
range of kick condition, the guest will not
send kick signal to the host.
Fixes: 06b12970174 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit f937309fbdbb48c354220a3e7110c202ae4aa7fa)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/net/virtio-net.c | 28 ++++++++++-------
hw/virtio/virtio.c | 64 +++++++++++++++++++++++++++++++++++---
include/hw/virtio/virtio.h | 21 +++++++++++--
3 files changed, 94 insertions(+), 19 deletions(-)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f48588638d..d4b979d343 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1680,24 +1680,28 @@ static bool virtio_net_can_receive(NetClientState *nc)
static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
{
+ int opaque;
+ unsigned int in_bytes;
VirtIONet *n = q->n;
- if (virtio_queue_empty(q->rx_vq) ||
- (n->mergeable_rx_bufs &&
- !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
- virtio_queue_set_notification(q->rx_vq, 1);
-
- /* To avoid a race condition where the guest has made some buffers
- * available after the above check but before notification was
- * enabled, check for available buffers again.
- */
- if (virtio_queue_empty(q->rx_vq) ||
- (n->mergeable_rx_bufs &&
- !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
+
+ while (virtio_queue_empty(q->rx_vq) || n->mergeable_rx_bufs) {
+ opaque = virtqueue_get_avail_bytes(q->rx_vq, &in_bytes, NULL,
+ bufsize, 0);
+ /* Buffer is enough, disable notifiaction */
+ if (bufsize <= in_bytes) {
+ break;
+ }
+
+ if (virtio_queue_enable_notification_and_check(q->rx_vq, opaque)) {
+ /* Guest has added some buffers, try again */
+ continue;
+ } else {
return 0;
}
}
virtio_queue_set_notification(q->rx_vq, 0);
+
return 1;
}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fd2dfe3a6b..08fba6b2d8 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -743,6 +743,60 @@ int virtio_queue_empty(VirtQueue *vq)
}
}
+static bool virtio_queue_split_poll(VirtQueue *vq, unsigned shadow_idx)
+{
+ if (unlikely(!vq->vring.avail)) {
+ return false;
+ }
+
+ return (uint16_t)shadow_idx != vring_avail_idx(vq);
+}
+
+static bool virtio_queue_packed_poll(VirtQueue *vq, unsigned shadow_idx)
+{
+ VRingPackedDesc desc;
+ VRingMemoryRegionCaches *caches;
+
+ if (unlikely(!vq->vring.desc)) {
+ return false;
+ }
+
+ caches = vring_get_region_caches(vq);
+ if (!caches) {
+ return false;
+ }
+
+ vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
+ shadow_idx, true);
+
+ return is_desc_avail(desc.flags, vq->shadow_avail_wrap_counter);
+}
+
+static bool virtio_queue_poll(VirtQueue *vq, unsigned shadow_idx)
+{
+ if (virtio_device_disabled(vq->vdev)) {
+ return false;
+ }
+
+ if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+ return virtio_queue_packed_poll(vq, shadow_idx);
+ } else {
+ return virtio_queue_split_poll(vq, shadow_idx);
+ }
+}
+
+bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
+ int opaque)
+{
+ virtio_queue_set_notification(vq, 1);
+
+ if (opaque >= 0) {
+ return virtio_queue_poll(vq, (unsigned)opaque);
+ } else {
+ return false;
+ }
+}
+
static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
unsigned int len)
{
@@ -1330,9 +1384,9 @@ err:
goto done;
}
-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
- unsigned int *out_bytes,
- unsigned max_in_bytes, unsigned max_out_bytes)
+int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
+ unsigned int *out_bytes, unsigned max_in_bytes,
+ unsigned max_out_bytes)
{
uint16_t desc_size;
VRingMemoryRegionCaches *caches;
@@ -1365,7 +1419,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
caches);
}
- return;
+ return (int)vq->shadow_avail_idx;
err:
if (in_bytes) {
*in_bytes = 0;
@@ -1373,6 +1427,8 @@ err:
if (out_bytes) {
*out_bytes = 0;
}
+
+ return -1;
}
int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 2eafad17b8..8b4da92889 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -271,9 +271,13 @@ void qemu_put_virtqueue_element(VirtIODevice *vdev, QEMUFile *f,
VirtQueueElement *elem);
int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
unsigned int out_bytes);
-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
- unsigned int *out_bytes,
- unsigned max_in_bytes, unsigned max_out_bytes);
+/**
+ * Return <0 on error or an opaque >=0 to pass to
+ * virtio_queue_enable_notification_and_check on success.
+ */
+int virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
+ unsigned int *out_bytes, unsigned max_in_bytes,
+ unsigned max_out_bytes);
void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
@@ -307,6 +311,17 @@ int virtio_queue_ready(VirtQueue *vq);
int virtio_queue_empty(VirtQueue *vq);
+/**
+ * Enable notification and check whether guest has added some
+ * buffers since last call to virtqueue_get_avail_bytes.
+ *
+ * @opaque: value returned from virtqueue_get_avail_bytes
+ */
+bool virtio_queue_enable_notification_and_check(VirtQueue *vq,
+ int opaque);
+
+void virtio_queue_set_shadow_avail_idx(VirtQueue *vq, uint16_t idx);
+
/* Host binding interface. */
uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);

View File

@@ -0,0 +1,70 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Tue, 9 Jul 2024 13:34:44 +0100
Subject: [PATCH] net: Reinstate '-net nic, model=help' output as documented in
man page
While refactoring the NIC initialization code, I broke '-net nic,model=help'
which no longer outputs a list of available NIC models.
Fixes: 2cdeca04adab ("net: report list of available models according to platform")
Cc: qemu-stable@nongnu.org
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 64f75f57f9d2c8c12ac6d9355fa5d3a2af5879ca)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
net/net.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/net/net.c b/net/net.c
index a2f0c828bb..e6ca2529bb 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1150,6 +1150,21 @@ NICInfo *qemu_find_nic_info(const char *typename, bool match_default,
return NULL;
}
+static bool is_nic_model_help_option(const char *model)
+{
+ if (model && is_help_option(model)) {
+ /*
+ * Trigger the help output by instantiating the hash table which
+ * will gather tha available models as they get registered.
+ */
+ if (!nic_model_help) {
+ nic_model_help = g_hash_table_new_full(g_str_hash, g_str_equal,
+ g_free, NULL);
+ }
+ return true;
+ }
+ return false;
+}
/* "I have created a device. Please configure it if you can" */
bool qemu_configure_nic_device(DeviceState *dev, bool match_default,
@@ -1733,6 +1748,12 @@ void net_check_clients(void)
static int net_init_client(void *dummy, QemuOpts *opts, Error **errp)
{
+ const char *model = qemu_opt_get_del(opts, "model");
+
+ if (is_nic_model_help_option(model)) {
+ return 0;
+ }
+
return net_client_init(opts, false, errp);
}
@@ -1789,9 +1810,7 @@ static int net_param_nic(void *dummy, QemuOpts *opts, Error **errp)
memset(ni, 0, sizeof(*ni));
ni->model = qemu_opt_get_del(opts, "model");
- if (!nic_model_help && !g_strcmp0(ni->model, "help")) {
- nic_model_help = g_hash_table_new_full(g_str_hash, g_str_equal,
- g_free, NULL);
+ if (is_nic_model_help_option(ni->model)) {
return 0;
}

View File

@@ -0,0 +1,32 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Tue, 6 Aug 2024 18:21:37 +0100
Subject: [PATCH] net: Fix '-net nic,model=' for non-help arguments
Oops, don't *delete* the model option when checking for 'help'.
Fixes: 64f75f57f9d2 ("net: Reinstate '-net nic, model=help' output as documented in man page")
Reported-by: Hans <sungdgdhtryrt@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit fa62cb989a9146c82f8f172715042852f5d36200)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
net/net.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/net.c b/net/net.c
index e6ca2529bb..897bb936cf 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1748,7 +1748,7 @@ void net_check_clients(void)
static int net_init_client(void *dummy, QemuOpts *opts, Error **errp)
{
- const char *model = qemu_opt_get_del(opts, "model");
+ const char *model = qemu_opt_get(opts, "model");
if (is_nic_model_help_option(model)) {
return 0;

View File

@@ -0,0 +1,57 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Peter Maydell <peter.maydell@linaro.org>
Date: Mon, 22 Jul 2024 18:29:54 +0100
Subject: [PATCH] target/arm: Don't assert for 128-bit tile accesses when SVL
is 128
For an instruction which accesses a 128-bit element tile when
the SVL is also 128 (for example MOV z0.Q, p0/M, ZA0H.Q[w0,0]),
we will assert in get_tile_rowcol():
qemu-system-aarch64: ../../tcg/tcg-op.c:926: tcg_gen_deposit_z_i32: Assertion `len > 0' failed.
This happens because we calculate
len = ctz32(streaming_vec_reg_size(s)) - esz;$
but if the SVL and the element size are the same len is 0, and
the deposit operation asserts.
In this case the ZA storage contains exactly one 128 bit
element ZA tile, and the horizontal or vertical slice is just
that tile. This means that regardless of the index value in
the Ws register, we always access that tile. (In pseudocode terms,
we calculate (index + offset) MOD 1, which is 0.)
Special case the len == 0 case to avoid hitting the assertion
in tcg_gen_deposit_z_i32().
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240722172957.1041231-2-peter.maydell@linaro.org
(cherry picked from commit 56f1c0db928aae0b83fd91c89ddb226b137e2b21)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/arm/tcg/translate-sme.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 185a8a917b..a50a419af2 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -49,7 +49,15 @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs,
/* Prepare a power-of-two modulo via extraction of @len bits. */
len = ctz32(streaming_vec_reg_size(s)) - esz;
- if (vertical) {
+ if (!len) {
+ /*
+ * SVL is 128 and the element size is 128. There is exactly
+ * one 128x128 tile in the ZA storage, and so we calculate
+ * (Rs + imm) MOD 1, which is always 0. We need to special case
+ * this because TCG doesn't allow deposit ops with len 0.
+ */
+ tcg_gen_movi_i32(tmp, 0);
+ } else if (vertical) {
/*
* Compute the byte offset of the index within the tile:
* (index % (svl / size)) * size

View File

@@ -0,0 +1,59 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Peter Maydell <peter.maydell@linaro.org>
Date: Mon, 22 Jul 2024 18:29:55 +0100
Subject: [PATCH] target/arm: Fix UMOPA/UMOPS of 16-bit values
The UMOPA/UMOPS instructions are supposed to multiply unsigned 8 or
16 bit elements and accumulate the products into a 64-bit element.
In the Arm ARM pseudocode, this is done with the usual
infinite-precision signed arithmetic. However our implementation
doesn't quite get it right, because in the DEF_IMOP_64() macro we do:
sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0);
where NTYPE and MTYPE are uint16_t or int16_t. In the uint16_t case,
the C usual arithmetic conversions mean the values are converted to
"int" type and the multiply is done as a 32-bit multiply. This means
that if the inputs are, for example, 0xffff and 0xffff then the
result is 0xFFFE0001 as an int, which is then promoted to uint64_t
for the accumulation into sum; this promotion incorrectly sign
extends the multiply.
Avoid the incorrect sign extension by casting to int64_t before
the multiply, so we do the multiply as 64-bit signed arithmetic,
which is a type large enough that the multiply can never
overflow into the sign bit.
(The equivalent 8-bit operations in DEF_IMOP_32() are fine, because
the 8-bit multiplies can never overflow into the sign bit of a
32-bit integer.)
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2372
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240722172957.1041231-3-peter.maydell@linaro.org
(cherry picked from commit ea3f5a90f036734522e9af3bffd77e69e9f47355)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/arm/tcg/sme_helper.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index 5a6dd76489..f9001f5213 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -1146,10 +1146,10 @@ static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \
uint64_t sum = 0; \
/* Apply P to N as a mask, making the inactive elements 0. */ \
n &= expand_pred_h(p); \
- sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); \
- sum += (NTYPE)(n >> 16) * (MTYPE)(m >> 16); \
- sum += (NTYPE)(n >> 32) * (MTYPE)(m >> 32); \
- sum += (NTYPE)(n >> 48) * (MTYPE)(m >> 48); \
+ sum += (int64_t)(NTYPE)(n >> 0) * (MTYPE)(m >> 0); \
+ sum += (int64_t)(NTYPE)(n >> 16) * (MTYPE)(m >> 16); \
+ sum += (int64_t)(NTYPE)(n >> 32) * (MTYPE)(m >> 32); \
+ sum += (int64_t)(NTYPE)(n >> 48) * (MTYPE)(m >> 48); \
return neg ? a - sum : a + sum; \
}

View File

@@ -0,0 +1,62 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Peter Maydell <peter.maydell@linaro.org>
Date: Mon, 22 Jul 2024 18:29:56 +0100
Subject: [PATCH] target/arm: Avoid shifts by -1 in tszimm_shr() and
tszimm_shl()
The function tszimm_esz() returns a shift amount, or possibly -1 in
certain cases that correspond to unallocated encodings in the
instruction set. We catch these later in the trans_ functions
(generally with an "a-esz < 0" check), but before we do the
decodetree-generated code will also call tszimm_shr() or tszimm_sl(),
which will use the tszimm_esz() return value as a shift count without
checking that it is not negative, which is undefined behaviour.
Avoid the UB by checking the return value in tszimm_shr() and
tszimm_shl().
Cc: qemu-stable@nongnu.org
Resolves: Coverity CID 1547617, 1547694
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240722172957.1041231-4-peter.maydell@linaro.org
(cherry picked from commit 76916dfa89e8900639c1055c07a295c06628a0bc)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/arm/tcg/translate-sve.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index ada05aa530..466a19c25a 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -50,13 +50,27 @@ static int tszimm_esz(DisasContext *s, int x)
static int tszimm_shr(DisasContext *s, int x)
{
- return (16 << tszimm_esz(s, x)) - x;
+ /*
+ * We won't use the tszimm_shr() value if tszimm_esz() returns -1 (the
+ * trans function will check for esz < 0), so we can return any
+ * value we like from here in that case as long as we avoid UB.
+ */
+ int esz = tszimm_esz(s, x);
+ if (esz < 0) {
+ return esz;
+ }
+ return (16 << esz) - x;
}
/* See e.g. LSL (immediate, predicated). */
static int tszimm_shl(DisasContext *s, int x)
{
- return x - (8 << tszimm_esz(s, x));
+ /* As with tszimm_shr(), value will be unused if esz < 0 */
+ int esz = tszimm_esz(s, x);
+ if (esz < 0) {
+ return esz;
+ }
+ return x - (8 << esz);
}
/* The SH bit is in bit 8. Extract the low 8 and shift. */

View File

@@ -0,0 +1,41 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Peter Maydell <peter.maydell@linaro.org>
Date: Mon, 22 Jul 2024 18:29:57 +0100
Subject: [PATCH] target/arm: Ignore SMCR_EL2.LEN and SVCR_EL2.LEN if EL2 is
not enabled
When determining the current vector length, the SMCR_EL2.LEN and
SVCR_EL2.LEN settings should only be considered if EL2 is enabled
(compare the pseudocode CurrentSVL and CurrentNSVL which call
EL2Enabled()).
We were checking against ARM_FEATURE_EL2 rather than calling
arm_is_el2_enabled(), which meant that we would look at
SMCR_EL2/SVCR_EL2 when in Secure EL1 or Secure EL0 even if Secure EL2
was not enabled.
Use the correct check in sve_vqm1_for_el_sm().
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240722172957.1041231-5-peter.maydell@linaro.org
(cherry picked from commit f573ac059ed060234fcef4299fae9e500d357c33)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/arm/helper.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index a620481d7c..42044ae14b 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7191,7 +7191,7 @@ uint32_t sve_vqm1_for_el_sm(CPUARMState *env, int el, bool sm)
if (el <= 1 && !el_is_in_host(env, el)) {
len = MIN(len, 0xf & (uint32_t)cr[1]);
}
- if (el <= 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+ if (el <= 2 && arm_is_el2_enabled(env)) {
len = MIN(len, 0xf & (uint32_t)cr[2]);
}
if (arm_feature(env, ARM_FEATURE_EL3)) {

View File

@@ -0,0 +1,164 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Peter Maydell <peter.maydell@linaro.org>
Date: Thu, 1 Aug 2024 10:15:03 +0100
Subject: [PATCH] target/arm: Handle denormals correctly for FMOPA (widening)
The FMOPA (widening) SME instruction takes pairs of half-precision
floating point values, widens them to single-precision, does a
two-way dot product and accumulates the results into a
single-precision destination. We don't quite correctly handle the
FPCR bits FZ and FZ16 which control flushing of denormal inputs and
outputs. This is because at the moment we pass a single float_status
value to the helper function, which then uses that configuration for
all the fp operations it does. However, because the inputs to this
operation are float16 and the outputs are float32 we need to use the
fp_status_f16 for the float16 input widening but the normal fp_status
for everything else. Otherwise we will apply the flushing control
FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and
incorrectly flush a denormal output to zero when we should not (or
vice-versa).
(In commit 207d30b5fdb5b we tried to fix the FZ handling but
didn't get it right, switching from "use FPCR.FZ for everything" to
"use FPCR.FZ16 for everything".)
Pass the CPU env to the sme_fmopa_h helper instead of an fp_status
pointer, and have the helper pass an extra fp_status into the
f16_dotadd() function so that we can use the right status for the
right parts of this operation.
Cc: qemu-stable@nongnu.org
Fixes: 207d30b5fdb5 ("target/arm: Use FPST_F16 for SME FMOPA (widening)")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 55f9f4ee018c5ccea81d8c8c586756d7711ae46f)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/arm/tcg/helper-sme.h | 2 +-
target/arm/tcg/sme_helper.c | 39 +++++++++++++++++++++++-----------
target/arm/tcg/translate-sme.c | 25 ++++++++++++++++++++--
3 files changed, 51 insertions(+), 15 deletions(-)
diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h
index 27eef49a11..d22bf9d21b 100644
--- a/target/arm/tcg/helper-sme.h
+++ b/target/arm/tcg/helper-sme.h
@@ -121,7 +121,7 @@ DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG,
- void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+ void, ptr, ptr, ptr, ptr, ptr, env, i32)
DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index f9001f5213..3906bb51c0 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -976,12 +976,23 @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg)
}
static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
- float_status *s_std, float_status *s_odd)
+ float_status *s_f16, float_status *s_std,
+ float_status *s_odd)
{
- float64 e1r = float16_to_float64(e1 & 0xffff, true, s_std);
- float64 e1c = float16_to_float64(e1 >> 16, true, s_std);
- float64 e2r = float16_to_float64(e2 & 0xffff, true, s_std);
- float64 e2c = float16_to_float64(e2 >> 16, true, s_std);
+ /*
+ * We need three different float_status for different parts of this
+ * operation:
+ * - the input conversion of the float16 values must use the
+ * f16-specific float_status, so that the FPCR.FZ16 control is applied
+ * - operations on float32 including the final accumulation must use
+ * the normal float_status, so that FPCR.FZ is applied
+ * - we have pre-set-up copy of s_std which is set to round-to-odd,
+ * for the multiply (see below)
+ */
+ float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16);
+ float64 e1c = float16_to_float64(e1 >> 16, true, s_f16);
+ float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16);
+ float64 e2c = float16_to_float64(e2 >> 16, true, s_f16);
float64 t64;
float32 t32;
@@ -1003,20 +1014,23 @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
}
void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
- void *vpm, void *vst, uint32_t desc)
+ void *vpm, CPUARMState *env, uint32_t desc)
{
intptr_t row, col, oprsz = simd_maxsz(desc);
uint32_t neg = simd_data(desc) * 0x80008000u;
uint16_t *pn = vpn, *pm = vpm;
- float_status fpst_odd, fpst_std;
+ float_status fpst_odd, fpst_std, fpst_f16;
/*
- * Make a copy of float_status because this operation does not
- * update the cumulative fp exception status. It also produces
- * default nans. Make a second copy with round-to-odd -- see above.
+ * Make copies of fp_status and fp_status_f16, because this operation
+ * does not update the cumulative fp exception status. It also
+ * produces default NaNs. We also need a second copy of fp_status with
+ * round-to-odd -- see above.
*/
- fpst_std = *(float_status *)vst;
+ fpst_f16 = env->vfp.fp_status_f16;
+ fpst_std = env->vfp.fp_status;
set_default_nan_mode(true, &fpst_std);
+ set_default_nan_mode(true, &fpst_f16);
fpst_odd = fpst_std;
set_float_rounding_mode(float_round_to_odd, &fpst_odd);
@@ -1036,7 +1050,8 @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
uint32_t m = *(uint32_t *)(vzm + H1_4(col));
m = f16mop_adj_pair(m, pcol, 0);
- *a = f16_dotadd(*a, n, m, &fpst_std, &fpst_odd);
+ *a = f16_dotadd(*a, n, m,
+ &fpst_f16, &fpst_std, &fpst_odd);
}
col += 4;
pcol >>= 4;
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index a50a419af2..ae42ddef7b 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -334,8 +334,29 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
return true;
}
-TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a,
- MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h)
+static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz,
+ gen_helper_gvec_5_ptr *fn)
+{
+ int svl = streaming_vec_reg_size(s);
+ uint32_t desc = simd_desc(svl, svl, a->sub);
+ TCGv_ptr za, zn, zm, pn, pm;
+
+ if (!sme_smza_enabled_check(s)) {
+ return true;
+ }
+
+ za = get_tile(s, esz, a->zad);
+ zn = vec_full_reg_ptr(s, a->zn);
+ zm = vec_full_reg_ptr(s, a->zm);
+ pn = pred_full_reg_ptr(s, a->pn);
+ pm = pred_full_reg_ptr(s, a->pm);
+
+ fn(za, zn, zm, pn, pm, tcg_env, tcg_constant_i32(desc));
+ return true;
+}
+
+TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a,
+ MO_32, gen_helper_sme_fmopa_h)
TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a,
MO_32, FPST_FPCR, gen_helper_sme_fmopa_s)
TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a,

View File

@@ -0,0 +1,39 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Mathieu--Drif?=
<clement.mathieu--drif@eviden.com>
Date: Tue, 9 Jul 2024 14:26:08 +0000
Subject: [PATCH] intel_iommu: fix FRCD construction macro
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The constant must be unsigned, otherwise the two's complement
overrides the other fields when a PASID is present.
Fixes: 1b2b12376c8a ("intel-iommu: PASID support")
Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Minwoo Im <minwoo.im@samsung.com>
Message-Id: <20240709142557.317271-2-clement.mathieu--drif@eviden.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit a3c8d7e38550c3d5a46e6fa94ffadfa625a4861d)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/i386/intel_iommu_internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f8cf99bddf..cbc4030031 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -267,7 +267,7 @@
/* For the low 64-bit of 128-bit */
#define VTD_FRCD_FI(val) ((val) & ~0xfffULL)
#define VTD_FRCD_PV(val) (((val) & 0xffffULL) << 40)
-#define VTD_FRCD_PP(val) (((val) & 0x1) << 31)
+#define VTD_FRCD_PP(val) (((val) & 0x1ULL) << 31)
#define VTD_FRCD_IR_IDX(val) (((val) & 0xffffULL) << 48)
/* DMA Remapping Fault Conditions */

View File

@@ -0,0 +1,33 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Richard Henderson <richard.henderson@linaro.org>
Date: Mon, 12 Aug 2024 12:58:42 +1000
Subject: [PATCH] target/i386: Do not apply REX to MMX operands
Cc: qemu-stable@nongnu.org
Fixes: b3e22b2318a ("target/i386: add core of new i386 decoder")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2495
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240812025844.58956-2-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 416f2b16c02c618c0f233372ebfe343f9ee667d4)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/i386/tcg/decode-new.c.inc | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 4209d59ca8..09b8d2314a 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -1271,7 +1271,10 @@ static bool decode_op(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode,
op->unit = X86_OP_SSE;
}
get_reg:
- op->n = ((get_modrm(s, env) >> 3) & 7) | REX_R(s);
+ op->n = ((get_modrm(s, env) >> 3) & 7);
+ if (op->unit != X86_OP_MMX) {
+ op->n |= REX_R(s);
+ }
break;
case X86_TYPE_E: /* ALU modrm operand */

View File

@@ -0,0 +1,42 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Date: Fri, 9 Aug 2024 14:13:40 +0200
Subject: [PATCH] module: Prevent crash by resetting local_err in
module_load_qom_all()
Set local_err to NULL after it has been freed in error_report_err(). This
avoids triggering assert(*errp == NULL) failure in error_setv() when
local_err is reused in the loop.
Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Link: https://lore.kernel.org/r/20240809121340.992049-2-alexander.ivanov@virtuozzo.com
[Do the same by moving the declaration instead. - Paolo]
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 940d802b24e63650e0eacad3714e2ce171cba17c)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
util/module.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/util/module.c b/util/module.c
index 32e263163c..3eb0f06df1 100644
--- a/util/module.c
+++ b/util/module.c
@@ -354,13 +354,13 @@ int module_load_qom(const char *type, Error **errp)
void module_load_qom_all(void)
{
const QemuModinfo *modinfo;
- Error *local_err = NULL;
if (module_loaded_qom_all) {
return;
}
for (modinfo = module_info; modinfo->name != NULL; modinfo++) {
+ Error *local_err = NULL;
if (!modinfo->objs) {
continue;
}

View File

@@ -0,0 +1,164 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Wed, 7 Aug 2024 08:50:01 -0500
Subject: [PATCH] nbd/server: Plumb in new args to nbd_client_add()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Upcoming patches to fix a CVE need to track an opaque pointer passed
in by the owner of a client object, as well as request for a time
limit on how fast negotiation must complete. Prepare for that by
changing the signature of nbd_client_new() and adding an accessor to
get at the opaque pointer, although for now the two servers
(qemu-nbd.c and blockdev-nbd.c) do not change behavior even though
they pass in a new default timeout value.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-11-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: s/LIMIT/MAX_SECS/ as suggested by Dan]
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit fb1c2aaa981e0a2fa6362c9985f1296b74f055ac)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
blockdev-nbd.c | 6 ++++--
include/block/nbd.h | 11 ++++++++++-
nbd/server.c | 20 +++++++++++++++++---
qemu-nbd.c | 4 +++-
4 files changed, 34 insertions(+), 7 deletions(-)
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 213012435f..267a1de903 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -64,8 +64,10 @@ static void nbd_accept(QIONetListener *listener, QIOChannelSocket *cioc,
nbd_update_server_watch(nbd_server);
qio_channel_set_name(QIO_CHANNEL(cioc), "nbd-server");
- nbd_client_new(cioc, nbd_server->tlscreds, nbd_server->tlsauthz,
- nbd_blockdev_client_closed);
+ /* TODO - expose handshake timeout as QMP option */
+ nbd_client_new(cioc, NBD_DEFAULT_HANDSHAKE_MAX_SECS,
+ nbd_server->tlscreds, nbd_server->tlsauthz,
+ nbd_blockdev_client_closed, NULL);
}
static void nbd_update_server_watch(NBDServerData *s)
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 4e7bd6342f..1d4d65922d 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -33,6 +33,12 @@ typedef struct NBDMetaContexts NBDMetaContexts;
extern const BlockExportDriver blk_exp_nbd;
+/*
+ * NBD_DEFAULT_HANDSHAKE_MAX_SECS: Number of seconds in which client must
+ * succeed at NBD_OPT_GO before being forcefully dropped as too slow.
+ */
+#define NBD_DEFAULT_HANDSHAKE_MAX_SECS 10
+
/* Handshake phase structs - this struct is passed on the wire */
typedef struct NBDOption {
@@ -403,9 +409,12 @@ AioContext *nbd_export_aio_context(NBDExport *exp);
NBDExport *nbd_export_find(const char *name);
void nbd_client_new(QIOChannelSocket *sioc,
+ uint32_t handshake_max_secs,
QCryptoTLSCreds *tlscreds,
const char *tlsauthz,
- void (*close_fn)(NBDClient *, bool));
+ void (*close_fn)(NBDClient *, bool),
+ void *owner);
+void *nbd_client_owner(NBDClient *client);
void nbd_client_get(NBDClient *client);
void nbd_client_put(NBDClient *client);
diff --git a/nbd/server.c b/nbd/server.c
index 892797bb11..e50012499f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -124,12 +124,14 @@ struct NBDMetaContexts {
struct NBDClient {
int refcount; /* atomic */
void (*close_fn)(NBDClient *client, bool negotiated);
+ void *owner;
QemuMutex lock;
NBDExport *exp;
QCryptoTLSCreds *tlscreds;
char *tlsauthz;
+ uint32_t handshake_max_secs;
QIOChannelSocket *sioc; /* The underlying data channel */
QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
@@ -3191,6 +3193,7 @@ static coroutine_fn void nbd_co_client_start(void *opaque)
qemu_co_mutex_init(&client->send_lock);
+ /* TODO - utilize client->handshake_max_secs */
if (nbd_negotiate(client, &local_err)) {
if (local_err) {
error_report_err(local_err);
@@ -3205,14 +3208,17 @@ static coroutine_fn void nbd_co_client_start(void *opaque)
}
/*
- * Create a new client listener using the given channel @sioc.
+ * Create a new client listener using the given channel @sioc and @owner.
* Begin servicing it in a coroutine. When the connection closes, call
- * @close_fn with an indication of whether the client completed negotiation.
+ * @close_fn with an indication of whether the client completed negotiation
+ * within @handshake_max_secs seconds (0 for unbounded).
*/
void nbd_client_new(QIOChannelSocket *sioc,
+ uint32_t handshake_max_secs,
QCryptoTLSCreds *tlscreds,
const char *tlsauthz,
- void (*close_fn)(NBDClient *, bool))
+ void (*close_fn)(NBDClient *, bool),
+ void *owner)
{
NBDClient *client;
Coroutine *co;
@@ -3225,13 +3231,21 @@ void nbd_client_new(QIOChannelSocket *sioc,
object_ref(OBJECT(client->tlscreds));
}
client->tlsauthz = g_strdup(tlsauthz);
+ client->handshake_max_secs = handshake_max_secs;
client->sioc = sioc;
qio_channel_set_delay(QIO_CHANNEL(sioc), false);
object_ref(OBJECT(client->sioc));
client->ioc = QIO_CHANNEL(sioc);
object_ref(OBJECT(client->ioc));
client->close_fn = close_fn;
+ client->owner = owner;
co = qemu_coroutine_create(nbd_co_client_start, client);
qemu_coroutine_enter(co);
}
+
+void *
+nbd_client_owner(NBDClient *client)
+{
+ return client->owner;
+}
diff --git a/qemu-nbd.c b/qemu-nbd.c
index d7b3ccab21..48e2fa5858 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -390,7 +390,9 @@ static void nbd_accept(QIONetListener *listener, QIOChannelSocket *cioc,
nb_fds++;
nbd_update_server_watch();
- nbd_client_new(cioc, tlscreds, tlsauthz, nbd_client_closed);
+ /* TODO - expose handshake timeout as command line option */
+ nbd_client_new(cioc, NBD_DEFAULT_HANDSHAKE_MAX_SECS,
+ tlscreds, tlsauthz, nbd_client_closed, NULL);
}
static void nbd_update_server_watch(void)

View File

@@ -0,0 +1,172 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Tue, 6 Aug 2024 13:53:00 -0500
Subject: [PATCH] nbd/server: CVE-2024-7409: Cap default max-connections to 100
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Allowing an unlimited number of clients to any web service is a recipe
for a rudimentary denial of service attack: the client merely needs to
open lots of sockets without closing them, until qemu no longer has
any more fds available to allocate.
For qemu-nbd, we default to allowing only 1 connection unless more are
explicitly asked for (-e or --shared); this was historically picked as
a nice default (without an explicit -t, a non-persistent qemu-nbd goes
away after a client disconnects, without needing any additional
follow-up commands), and we are not going to change that interface now
(besides, someday we want to point people towards qemu-storage-daemon
instead of qemu-nbd).
But for qemu proper, and the newer qemu-storage-daemon, the QMP
nbd-server-start command has historically had a default of unlimited
number of connections, in part because unlike qemu-nbd it is
inherently persistent until nbd-server-stop. Allowing multiple client
sockets is particularly useful for clients that can take advantage of
MULTI_CONN (creating parallel sockets to increase throughput),
although known clients that do so (such as libnbd's nbdcopy) typically
use only 8 or 16 connections (the benefits of scaling diminish once
more sockets are competing for kernel attention). Picking a number
large enough for typical use cases, but not unlimited, makes it
slightly harder for a malicious client to perform a denial of service
merely by opening lots of connections withot progressing through the
handshake.
This change does not eliminate CVE-2024-7409 on its own, but reduces
the chance for fd exhaustion or unlimited memory usage as an attack
surface. On the other hand, by itself, it makes it more obvious that
with a finite limit, we have the problem of an unauthenticated client
holding 100 fds opened as a way to block out a legitimate client from
being able to connect; thus, later patches will further add timeouts
to reject clients that are not making progress.
This is an INTENTIONAL change in behavior, and will break any client
of nbd-server-start that was not passing an explicit max-connections
parameter, yet expects more than 100 simultaneous connections. We are
not aware of any such client (as stated above, most clients aware of
MULTI_CONN get by just fine on 8 or 16 connections, and probably cope
with later connections failing by relying on the earlier connections;
libvirt has not yet been passing max-connections, but generally
creates NBD servers with the intent for a single client for the sake
of live storage migration; meanwhile, the KubeSAN project anticipates
a large cluster sharing multiple clients [up to 8 per node, and up to
100 nodes in a cluster], but it currently uses qemu-nbd with an
explicit --shared=0 rather than qemu-storage-daemon with
nbd-server-start).
We considered using a deprecation period (declare that omitting
max-parameters is deprecated, and make it mandatory in 3 releases -
then we don't need to pick an arbitrary default); that has zero risk
of breaking any apps that accidentally depended on more than 100
connections, and where such breakage might not be noticed under unit
testing but only under the larger loads of production usage. But it
does not close the denial-of-service hole until far into the future,
and requires all apps to change to add the parameter even if 100 was
good enough. It also has a drawback that any app (like libvirt) that
is accidentally relying on an unlimited default should seriously
consider their own CVE now, at which point they are going to change to
pass explicit max-connections sooner than waiting for 3 qemu releases.
Finally, if our changed default breaks an app, that app can always
pass in an explicit max-parameters with a larger value.
It is also intentional that the HMP interface to nbd-server-start is
not changed to expose max-connections (any client needing to fine-tune
things should be using QMP).
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-12-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[ericb: Expand commit message to summarize Dan's argument for why we
break corner-case back-compat behavior without a deprecation period]
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit c8a76dbd90c2f48df89b75bef74917f90a59b623)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 3 ++-
blockdev-nbd.c | 8 ++++++++
include/block/nbd.h | 7 +++++++
qapi/block-export.json | 4 ++--
4 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index d954bec6f1..bdf2eb50b6 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -402,7 +402,8 @@ void hmp_nbd_server_start(Monitor *mon, const QDict *qdict)
goto exit;
}
- nbd_server_start(addr, NULL, NULL, 0, &local_err);
+ nbd_server_start(addr, NULL, NULL, NBD_DEFAULT_MAX_CONNECTIONS,
+ &local_err);
qapi_free_SocketAddress(addr);
if (local_err != NULL) {
goto exit;
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 267a1de903..24ba5382db 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -170,6 +170,10 @@ void nbd_server_start(SocketAddress *addr, const char *tls_creds,
void nbd_server_start_options(NbdServerOptions *arg, Error **errp)
{
+ if (!arg->has_max_connections) {
+ arg->max_connections = NBD_DEFAULT_MAX_CONNECTIONS;
+ }
+
nbd_server_start(arg->addr, arg->tls_creds, arg->tls_authz,
arg->max_connections, errp);
}
@@ -182,6 +186,10 @@ void qmp_nbd_server_start(SocketAddressLegacy *addr,
{
SocketAddress *addr_flat = socket_address_flatten(addr);
+ if (!has_max_connections) {
+ max_connections = NBD_DEFAULT_MAX_CONNECTIONS;
+ }
+
nbd_server_start(addr_flat, tls_creds, tls_authz, max_connections, errp);
qapi_free_SocketAddress(addr_flat);
}
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 1d4d65922d..d4f8b21aec 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -39,6 +39,13 @@ extern const BlockExportDriver blk_exp_nbd;
*/
#define NBD_DEFAULT_HANDSHAKE_MAX_SECS 10
+/*
+ * NBD_DEFAULT_MAX_CONNECTIONS: Number of client sockets to allow at
+ * once; must be large enough to allow a MULTI_CONN-aware client like
+ * nbdcopy to create its typical number of 8-16 sockets.
+ */
+#define NBD_DEFAULT_MAX_CONNECTIONS 100
+
/* Handshake phase structs - this struct is passed on the wire */
typedef struct NBDOption {
diff --git a/qapi/block-export.json b/qapi/block-export.json
index 3919a2d5b9..f45e4fd481 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -28,7 +28,7 @@
# @max-connections: The maximum number of connections to allow at the
# same time, 0 for unlimited. Setting this to 1 also stops the
# server from advertising multiple client support (since 5.2;
-# default: 0)
+# default: 100)
#
# Since: 4.2
##
@@ -63,7 +63,7 @@
# @max-connections: The maximum number of connections to allow at the
# same time, 0 for unlimited. Setting this to 1 also stops the
# server from advertising multiple client support (since 5.2;
-# default: 0).
+# default: 100).
#
# Errors:
# - if the server is already running

View File

@@ -0,0 +1,123 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Thu, 8 Aug 2024 16:05:08 -0500
Subject: [PATCH] nbd/server: CVE-2024-7409: Drop non-negotiating clients
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
A client that opens a socket but does not negotiate is merely hogging
qemu's resources (an open fd and a small amount of memory); and a
malicious client that can access the port where NBD is listening can
attempt a denial of service attack by intentionally opening and
abandoning lots of unfinished connections. The previous patch put a
default bound on the number of such ongoing connections, but once that
limit is hit, no more clients can connect (including legitimate ones).
The solution is to insist that clients complete handshake within a
reasonable time limit, defaulting to 10 seconds. A client that has
not successfully completed NBD_OPT_GO by then (including the case of
where the client didn't know TLS credentials to even reach the point
of NBD_OPT_GO) is wasting our time and does not deserve to stay
connected. Later patches will allow fine-tuning the limit away from
the default value (including disabling it for doing integration
testing of the handshake process itself).
Note that this patch in isolation actually makes it more likely to see
qemu SEGV after nbd-server-stop, as any client socket still connected
when the server shuts down will now be closed after 10 seconds rather
than at the client's whims. That will be addressed in the next patch.
For a demo of this patch in action:
$ qemu-nbd -f raw -r -t -e 10 file &
$ nbdsh --opt-mode -c '
H = list()
for i in range(20):
print(i)
H.insert(i, nbd.NBD())
H[i].set_opt_mode(True)
H[i].connect_uri("nbd://localhost")
'
$ kill $!
where later connections get to start progressing once earlier ones are
forcefully dropped for taking too long, rather than hanging.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-13-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: rebase to changes earlier in series, reduce scope of timer]
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit b9b72cb3ce15b693148bd09cef7e50110566d8a0)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
nbd/server.c | 28 +++++++++++++++++++++++++++-
nbd/trace-events | 1 +
2 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/nbd/server.c b/nbd/server.c
index e50012499f..39285cc971 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -3186,22 +3186,48 @@ static void nbd_client_receive_next_request(NBDClient *client)
}
}
+static void nbd_handshake_timer_cb(void *opaque)
+{
+ QIOChannel *ioc = opaque;
+
+ trace_nbd_handshake_timer_cb();
+ qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
static coroutine_fn void nbd_co_client_start(void *opaque)
{
NBDClient *client = opaque;
Error *local_err = NULL;
+ QEMUTimer *handshake_timer = NULL;
qemu_co_mutex_init(&client->send_lock);
- /* TODO - utilize client->handshake_max_secs */
+ /*
+ * Create a timer to bound the time spent in negotiation. If the
+ * timer expires, it is likely nbd_negotiate will fail because the
+ * socket was shutdown.
+ */
+ if (client->handshake_max_secs > 0) {
+ handshake_timer = aio_timer_new(qemu_get_aio_context(),
+ QEMU_CLOCK_REALTIME,
+ SCALE_NS,
+ nbd_handshake_timer_cb,
+ client->sioc);
+ timer_mod(handshake_timer,
+ qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
+ client->handshake_max_secs * NANOSECONDS_PER_SECOND);
+ }
+
if (nbd_negotiate(client, &local_err)) {
if (local_err) {
error_report_err(local_err);
}
+ timer_free(handshake_timer);
client_close(client, false);
return;
}
+ timer_free(handshake_timer);
WITH_QEMU_LOCK_GUARD(&client->lock) {
nbd_client_receive_next_request(client);
}
diff --git a/nbd/trace-events b/nbd/trace-events
index 00ae3216a1..cbd0a4ab7e 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -76,6 +76,7 @@ nbd_co_receive_request_payload_received(uint64_t cookie, uint64_t len) "Payload
nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
nbd_trip(void) "Reading request"
+nbd_handshake_timer_cb(void) "client took too long to negotiate"
# client-connection.c
nbd_connect_thread_sleep(uint64_t timeout) "timeout %" PRIu64

View File

@@ -0,0 +1,161 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Wed, 7 Aug 2024 12:23:13 -0500
Subject: [PATCH] nbd/server: CVE-2024-7409: Close stray clients at server-stop
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
A malicious client can attempt to connect to an NBD server, and then
intentionally delay progress in the handshake, including if it does
not know the TLS secrets. Although the previous two patches reduce
this behavior by capping the default max-connections parameter and
killing slow clients, they did not eliminate the possibility of a
client waiting to close the socket until after the QMP nbd-server-stop
command is executed, at which point qemu would SEGV when trying to
dereference the NULL nbd_server global which is no longer present.
This amounts to a denial of service attack. Worse, if another NBD
server is started before the malicious client disconnects, I cannot
rule out additional adverse effects when the old client interferes
with the connection count of the new server (although the most likely
is a crash due to an assertion failure when checking
nbd_server->connections > 0).
For environments without this patch, the CVE can be mitigated by
ensuring (such as via a firewall) that only trusted clients can
connect to an NBD server. Note that using frameworks like libvirt
that ensure that TLS is used and that nbd-server-stop is not executed
while any trusted clients are still connected will only help if there
is also no possibility for an untrusted client to open a connection
but then stall on the NBD handshake.
Given the previous patches, it would be possible to guarantee that no
clients remain connected by having nbd-server-stop sleep for longer
than the default handshake deadline before finally freeing the global
nbd_server object, but that could make QMP non-responsive for a long
time. So intead, this patch fixes the problem by tracking all client
sockets opened while the server is running, and forcefully closing any
such sockets remaining without a completed handshake at the time of
nbd-server-stop, then waiting until the coroutines servicing those
sockets notice the state change. nbd-server-stop now has a second
AIO_WAIT_WHILE_UNLOCKED (the first is indirectly through the
blk_exp_close_all_type() that disconnects all clients that completed
handshakes), but forced socket shutdown is enough to progress the
coroutines and quickly tear down all clients before the server is
freed, thus finally fixing the CVE.
This patch relies heavily on the fact that nbd/server.c guarantees
that it only calls nbd_blockdev_client_closed() from the main loop
(see the assertion in nbd_client_put() and the hoops used in
nbd_client_put_nonzero() to achieve that); if we did not have that
guarantee, we would also need a mutex protecting our accesses of the
list of connections to survive re-entrancy from independent iothreads.
Although I did not actually try to test old builds, it looks like this
problem has existed since at least commit 862172f45c (v2.12.0, 2017) -
even back when that patch started using a QIONetListener to handle
listening on multiple sockets, nbd_server_free() was already unaware
that the nbd_blockdev_client_closed callback can be reached later by a
client thread that has not completed handshakes (and therefore the
client's socket never got added to the list closed in
nbd_export_close_all), despite that patch intentionally tearing down
the QIONetListener to prevent new clients.
Reported-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Fixes: CVE-2024-7409
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-14-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit 3e7ef738c8462c45043a1d39f702a0990406a3b3)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
blockdev-nbd.c | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 24ba5382db..f73409ae49 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -21,12 +21,18 @@
#include "io/channel-socket.h"
#include "io/net-listener.h"
+typedef struct NBDConn {
+ QIOChannelSocket *cioc;
+ QLIST_ENTRY(NBDConn) next;
+} NBDConn;
+
typedef struct NBDServerData {
QIONetListener *listener;
QCryptoTLSCreds *tlscreds;
char *tlsauthz;
uint32_t max_connections;
uint32_t connections;
+ QLIST_HEAD(, NBDConn) conns;
} NBDServerData;
static NBDServerData *nbd_server;
@@ -51,6 +57,14 @@ int nbd_server_max_connections(void)
static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
{
+ NBDConn *conn = nbd_client_owner(client);
+
+ assert(qemu_in_main_thread() && nbd_server);
+
+ object_unref(OBJECT(conn->cioc));
+ QLIST_REMOVE(conn, next);
+ g_free(conn);
+
nbd_client_put(client);
assert(nbd_server->connections > 0);
nbd_server->connections--;
@@ -60,14 +74,20 @@ static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
static void nbd_accept(QIONetListener *listener, QIOChannelSocket *cioc,
gpointer opaque)
{
+ NBDConn *conn = g_new0(NBDConn, 1);
+
+ assert(qemu_in_main_thread() && nbd_server);
nbd_server->connections++;
+ object_ref(OBJECT(cioc));
+ conn->cioc = cioc;
+ QLIST_INSERT_HEAD(&nbd_server->conns, conn, next);
nbd_update_server_watch(nbd_server);
qio_channel_set_name(QIO_CHANNEL(cioc), "nbd-server");
/* TODO - expose handshake timeout as QMP option */
nbd_client_new(cioc, NBD_DEFAULT_HANDSHAKE_MAX_SECS,
nbd_server->tlscreds, nbd_server->tlsauthz,
- nbd_blockdev_client_closed, NULL);
+ nbd_blockdev_client_closed, conn);
}
static void nbd_update_server_watch(NBDServerData *s)
@@ -81,12 +101,25 @@ static void nbd_update_server_watch(NBDServerData *s)
static void nbd_server_free(NBDServerData *server)
{
+ NBDConn *conn, *tmp;
+
if (!server) {
return;
}
+ /*
+ * Forcefully close the listener socket, and any clients that have
+ * not yet disconnected on their own.
+ */
qio_net_listener_disconnect(server->listener);
object_unref(OBJECT(server->listener));
+ QLIST_FOREACH_SAFE(conn, &server->conns, next, tmp) {
+ qio_channel_shutdown(QIO_CHANNEL(conn->cioc), QIO_CHANNEL_SHUTDOWN_BOTH,
+ NULL);
+ }
+
+ AIO_WAIT_WHILE_UNLOCKED(NULL, server->connections > 0);
+
if (server->tlscreds) {
object_unref(OBJECT(server->tlscreds));
}

View File

@@ -0,0 +1,47 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= <marcandre.lureau@redhat.com>
Date: Tue, 20 Aug 2024 17:11:12 +0400
Subject: [PATCH] vnc: fix crash when no console attached
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit e99441a3793b5 ("ui/curses: Do not use console_select()")
qemu_text_console_put_keysym() no longer checks for NULL console
argument, which leads to a later crash:
Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x00005555559ee186 in qemu_text_console_handle_keysym (s=0x0, keysym=31) at ../ui/console-vc.c:332
332 } else if (s->echo && (keysym == '\r' || keysym == '\n')) {
(gdb) bt
#0 0x00005555559ee186 in qemu_text_console_handle_keysym (s=0x0, keysym=31) at ../ui/console-vc.c:332
#1 0x00005555559e18e5 in qemu_text_console_put_keysym (s=<optimized out>, keysym=<optimized out>) at ../ui/console.c:303
#2 0x00005555559f2e88 in do_key_event (vs=vs@entry=0x5555579045c0, down=down@entry=1, keycode=keycode@entry=60, sym=sym@entry=65471) at ../ui/vnc.c:2034
#3 0x00005555559f845c in ext_key_event (vs=0x5555579045c0, down=1, sym=65471, keycode=<optimized out>) at ../ui/vnc.c:2070
#4 protocol_client_msg (vs=0x5555579045c0, data=<optimized out>, len=<optimized out>) at ../ui/vnc.c:2514
#5 0x00005555559f515c in vnc_client_read (vs=0x5555579045c0) at ../ui/vnc.c:1607
Fixes: e99441a3793b5 ("ui/curses: Do not use console_select()")
Fixes: https://issues.redhat.com/browse/RHEL-50529
Cc: qemu-stable@nongnu.org
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
(picked from https://lore.kernel.org/qemu-devel/20240820131112.1267954-1-marcandre.lureau@redhat.com/)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
ui/vnc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ui/vnc.c b/ui/vnc.c
index b3fd78022b..953ea38318 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -1935,7 +1935,7 @@ static void do_key_event(VncState *vs, int down, int keycode, int sym)
}
qkbd_state_key_event(vs->vd->kbd, qcode, down);
- if (!qemu_console_is_graphic(vs->vd->dcl.con)) {
+ if (QEMU_IS_TEXT_CONSOLE(vs->vd->dcl.con)) {
QemuTextConsole *con = QEMU_TEXT_CONSOLE(vs->vd->dcl.con);
bool numlock = qkbd_state_modifier_get(vs->vd->kbd, QKBD_MOD_NUMLOCK);
bool control = qkbd_state_modifier_get(vs->vd->kbd, QKBD_MOD_CTRL);

View File

@@ -0,0 +1,89 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Thu, 22 Aug 2024 09:35:29 -0500
Subject: [PATCH] nbd/server: CVE-2024-7409: Avoid use-after-free when closing
server
Commit 3e7ef738 plugged the use-after-free of the global nbd_server
object, but overlooked a use-after-free of nbd_server->listener.
Although this race is harder to hit, notice that our shutdown path
first drops the reference count of nbd_server->listener, then triggers
actions that can result in a pending client reaching the
nbd_blockdev_client_closed() callback, which in turn calls
qio_net_listener_set_client_func on a potentially stale object.
If we know we don't want any more clients to connect, and have already
told the listener socket to shut down, then we should not be trying to
update the listener socket's associated function.
Reproducer:
> #!/usr/bin/python3
>
> import os
> from threading import Thread
>
> def start_stop():
> while 1:
> os.system('virsh qemu-monitor-command VM \'{"execute": "nbd-server-start",
+"arguments":{"addr":{"type":"unix","data":{"path":"/tmp/nbd-sock"}}}}\'')
> os.system('virsh qemu-monitor-command VM \'{"execute": "nbd-server-stop"}\'')
>
> def nbd_list():
> while 1:
> os.system('/path/to/build/qemu-nbd -L -k /tmp/nbd-sock')
>
> def test():
> sst = Thread(target=start_stop)
> sst.start()
> nlt = Thread(target=nbd_list)
> nlt.start()
>
> sst.join()
> nlt.join()
>
> test()
Fixes: CVE-2024-7409
Fixes: 3e7ef738c8 ("nbd/server: CVE-2024-7409: Close stray clients at server-stop")
CC: qemu-stable@nongnu.org
Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240822143617.800419-2-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 3874f5f73c441c52f1c699c848d463b0eda01e4c)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
blockdev-nbd.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index f73409ae49..b36f41b7c5 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -92,10 +92,13 @@ static void nbd_accept(QIONetListener *listener, QIOChannelSocket *cioc,
static void nbd_update_server_watch(NBDServerData *s)
{
- if (!s->max_connections || s->connections < s->max_connections) {
- qio_net_listener_set_client_func(s->listener, nbd_accept, NULL, NULL);
- } else {
- qio_net_listener_set_client_func(s->listener, NULL, NULL, NULL);
+ if (s->listener) {
+ if (!s->max_connections || s->connections < s->max_connections) {
+ qio_net_listener_set_client_func(s->listener, nbd_accept, NULL,
+ NULL);
+ } else {
+ qio_net_listener_set_client_func(s->listener, NULL, NULL, NULL);
+ }
}
}
@@ -113,6 +116,7 @@ static void nbd_server_free(NBDServerData *server)
*/
qio_net_listener_disconnect(server->listener);
object_unref(OBJECT(server->listener));
+ server->listener = NULL;
QLIST_FOREACH_SAFE(conn, &server->conns, next, tmp) {
qio_channel_shutdown(QIO_CHANNEL(conn->cioc), QIO_CHANNEL_SHUTDOWN_BOTH,
NULL);

View File

@@ -0,0 +1,134 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Wed, 28 Aug 2024 11:07:43 +0200
Subject: [PATCH] softmmu/physmem: fix memory leak in dirty_memory_extend()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
As reported by Peter, we might be leaking memory when removing the
highest RAMBlock (in the weird ram_addr_t space), and adding a new one.
We will fail to realize that we already allocated bitmaps for more
dirty memory blocks, and effectively discard the pointers to them.
Fix it by getting rid of last_ram_page() and by remembering the number
of dirty memory blocks that have been allocated already.
While at it, let's use "unsigned int" for the number of blocks, which
should be sufficient until we reach ~32 exabytes.
Looks like this leak was introduced as we switched from using a single
bitmap_zero_extend() to allocating multiple bitmaps:
bitmap_zero_extend() relies on g_renew() which should have taken care of
this.
Resolves: https://lkml.kernel.org/r/CAFEAcA-k7a+VObGAfCFNygQNfCKL=AfX6A4kScq=VSSK0peqPg@mail.gmail.com
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
(picked from https://lore.kernel.org/qemu-devel/20240828090743.128647-1-david@redhat.com/)
[FE: backport - remove not-yet-existing variable in context of hunk touching ram_block_add()]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
include/exec/ramlist.h | 1 +
system/physmem.c | 35 +++++++++--------------------------
2 files changed, 10 insertions(+), 26 deletions(-)
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index 2ad2a81acc..d9cfe530be 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -50,6 +50,7 @@ typedef struct RAMList {
/* RCU-enabled, writes protected by the ramlist lock. */
QLIST_HEAD(, RAMBlock) blocks;
DirtyMemoryBlocks *dirty_memory[DIRTY_MEMORY_NUM];
+ unsigned int num_dirty_blocks;
uint32_t version;
QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
} RAMList;
diff --git a/system/physmem.c b/system/physmem.c
index a4fe3d2bf8..78f7db1121 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1497,18 +1497,6 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
return offset;
}
-static unsigned long last_ram_page(void)
-{
- RAMBlock *block;
- ram_addr_t last = 0;
-
- RCU_READ_LOCK_GUARD();
- RAMBLOCK_FOREACH(block) {
- last = MAX(last, block->offset + block->max_length);
- }
- return last >> TARGET_PAGE_BITS;
-}
-
static void qemu_ram_setup_dump(void *addr, ram_addr_t size)
{
int ret;
@@ -1762,13 +1750,11 @@ void qemu_ram_msync(RAMBlock *block, ram_addr_t start, ram_addr_t length)
}
/* Called with ram_list.mutex held */
-static void dirty_memory_extend(ram_addr_t old_ram_size,
- ram_addr_t new_ram_size)
+static void dirty_memory_extend(ram_addr_t new_ram_size)
{
- ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
- DIRTY_MEMORY_BLOCK_SIZE);
- ram_addr_t new_num_blocks = DIV_ROUND_UP(new_ram_size,
- DIRTY_MEMORY_BLOCK_SIZE);
+ unsigned int old_num_blocks = ram_list.num_dirty_blocks;
+ unsigned int new_num_blocks = DIV_ROUND_UP(new_ram_size,
+ DIRTY_MEMORY_BLOCK_SIZE);
int i;
/* Only need to extend if block count increased */
@@ -1800,6 +1786,8 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
g_free_rcu(old_blocks, rcu);
}
}
+
+ ram_list.num_dirty_blocks = new_num_blocks;
}
static void ram_block_add(RAMBlock *new_block, Error **errp)
@@ -1808,11 +1796,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
const bool shared = qemu_ram_is_shared(new_block);
RAMBlock *block;
RAMBlock *last_block = NULL;
- ram_addr_t old_ram_size, new_ram_size;
+ ram_addr_t ram_size;
Error *err = NULL;
- old_ram_size = last_ram_page();
-
qemu_mutex_lock_ramlist();
new_block->offset = find_ram_offset(new_block->max_length);
@@ -1840,11 +1826,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
}
}
- new_ram_size = MAX(old_ram_size,
- (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
- if (new_ram_size > old_ram_size) {
- dirty_memory_extend(old_ram_size, new_ram_size);
- }
+ ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
+ dirty_memory_extend(ram_size);
/* Keep the list sorted from biggest to smallest block. Unlike QTAILQ,
* QLIST (which has an RCU-friendly variant) does not have insertion at
* tail, so save the last element in last_block.

File diff suppressed because it is too large Load Diff

View File

@@ -28,7 +28,9 @@ Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
adapt to removal of QEMUFileOps
improve condition for entering final stage
adapt to QAPI and other changes for 8.2
make sure to not call vm_start() from coroutine]
make sure to not call vm_start() from coroutine
stop CPU throttling after finishing
force raw format when loading state as suggested by Friedrich Weber]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hmp-commands-info.hx | 13 +
@@ -36,13 +38,13 @@ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
include/migration/snapshot.h | 2 +
include/monitor/hmp.h | 3 +
migration/meson.build | 1 +
migration/savevm-async.c | 538 +++++++++++++++++++++++++++++++++++
migration/savevm-async.c | 549 +++++++++++++++++++++++++++++++++++
monitor/hmp-cmds.c | 38 +++
qapi/migration.json | 34 +++
qapi/misc.json | 18 ++
qemu-options.hx | 12 +
system/vl.c | 10 +
11 files changed, 686 insertions(+)
11 files changed, 697 insertions(+)
create mode 100644 migration/savevm-async.c
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
@@ -140,10 +142,10 @@ index 95d1cf2250..800f12a60d 100644
'threadinfo.c',
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
new file mode 100644
index 0000000000..72cf6588c2
index 0000000000..ae7ed68651
--- /dev/null
+++ b/migration/savevm-async.c
@@ -0,0 +1,538 @@
@@ -0,0 +1,549 @@
+#include "qemu/osdep.h"
+#include "migration/channel-savevm-async.h"
+#include "migration/migration.h"
@@ -154,6 +156,7 @@ index 0000000000..72cf6588c2
+#include "migration/global_state.h"
+#include "migration/ram.h"
+#include "migration/qemu-file.h"
+#include "sysemu/cpu-throttle.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
+#include "block/block.h"
@@ -342,6 +345,12 @@ index 0000000000..72cf6588c2
+ ret || aborted ? MIGRATION_STATUS_FAILED : MIGRATION_STATUS_COMPLETED);
+ ms->to_dst_file = NULL;
+
+ /*
+ * Same as in migration_iteration_finish(): saving RAM might've turned on CPU throttling for
+ * auto-converge, make sure to disable it.
+ */
+ cpu_throttle_stop();
+
+ qemu_savevm_state_cleanup();
+
+ ret = save_snapshot_cleanup();
@@ -633,12 +642,16 @@ index 0000000000..72cf6588c2
+ BlockBackend *be;
+ Error *local_err = NULL;
+ Error *blocker = NULL;
+ QDict *options;
+
+ QEMUFile *f;
+ size_t bs_pos = 0;
+ int ret = -EINVAL;
+
+ be = blk_new_open(filename, NULL, NULL, 0, &local_err);
+ options = qdict_new();
+ qdict_put_str(options, "driver", "raw");
+
+ be = blk_new_open(filename, NULL, options, 0, &local_err);
+
+ if (!be) {
+ error_setg(errp, "Could not open VM state file");

View File

@@ -193,10 +193,10 @@ index 32fd4a34fd..36a0cd8cc8 100644
/*
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 72cf6588c2..fb4e8ea689 100644
index ae7ed68651..cdd5ea6288 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -379,7 +379,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
@@ -386,7 +386,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
QIOChannel *ioc = QIO_CHANNEL(qio_channel_savevm_async_new(snap_state.target,
&snap_state.bs_pos));
@@ -205,7 +205,7 @@ index 72cf6588c2..fb4e8ea689 100644
if (!snap_state.file) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
@@ -503,7 +503,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
@@ -514,7 +514,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
blk_op_block_all(be, blocker);
/* restore the VM state */

View File

@@ -10,16 +10,17 @@ skip reads.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: improvements during create
allow partial restore]
allow partial restore
allow specifying disk formats for create operation]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/meson.build | 2 +
meson.build | 5 +
vma-reader.c | 870 ++++++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 817 +++++++++++++++++++++++++++++++++++++++++
vma.c | 901 ++++++++++++++++++++++++++++++++++++++++++++++
vma-reader.c | 870 ++++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 817 ++++++++++++++++++++++++++++++++++++++++
vma.c | 941 ++++++++++++++++++++++++++++++++++++++++++++++
vma.h | 150 ++++++++
6 files changed, 2745 insertions(+)
6 files changed, 2785 insertions(+)
create mode 100644 vma-reader.c
create mode 100644 vma-writer.c
create mode 100644 vma.c
@@ -1762,10 +1763,10 @@ index 0000000000..a466652a5d
+}
diff --git a/vma.c b/vma.c
new file mode 100644
index 0000000000..bb715e9061
index 0000000000..8d4b4be414
--- /dev/null
+++ b/vma.c
@@ -0,0 +1,901 @@
@@ -0,0 +1,941 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@@ -1797,8 +1798,8 @@ index 0000000000..bb715e9061
+ "usage: vma command [command options]\n"
+ "\n"
+ "vma list <filename>\n"
+ "vma config <filename> [-c config]\n"
+ "vma create <filename> [-c config] pathname ...\n"
+ "vma config <filename> [-c <config>]\n"
+ "vma create <filename> [-c <config>] [-d format=<format>:<device name>=<path> [-d ...]]\n"
+ "vma extract <filename> [-d <drive-list>] [-r <fifo>] <targetdir>\n"
+ "vma verify <filename> [-v]\n"
+ ;
@@ -2389,12 +2390,14 @@ index 0000000000..bb715e9061
+{
+ int c;
+ int verbose = 0;
+ bool expect_format = true;
+ const char *archivename;
+ GList *backup_coroutines = NULL;
+ GList *config_files = NULL;
+ GList *disk_infos = NULL;
+
+ for (;;) {
+ c = getopt(argc, argv, "hvc:");
+ c = getopt(argc, argv, "hvc:d:");
+ if (c == -1) {
+ break;
+ }
@@ -2406,6 +2409,9 @@ index 0000000000..bb715e9061
+ case 'c':
+ config_files = g_list_append(config_files, optarg);
+ break;
+ case 'd':
+ disk_infos = g_list_append(disk_infos, optarg);
+ break;
+ case 'v':
+ verbose = 1;
+ break;
@@ -2451,16 +2457,48 @@ index 0000000000..bb715e9061
+ l = g_list_next(l);
+ }
+
+ int devcount = 0;
+ /*
+ * Don't allow mixing new and old way to specifiy disks.
+ * TODO PVE 9 drop old way and always require format.
+ */
+ if (optind < argc && g_list_first(disk_infos)) {
+ unlink(archivename);
+ g_error("Unexpected extra argument - specify all devices via '-d'");
+ }
+
+ while (optind < argc) {
+ const char *path = argv[optind++];
+ expect_format = false;
+ disk_infos = g_list_append(disk_infos, argv[optind++]);
+ }
+
+ int devcount = 0;
+ GList *disk_l = disk_infos;
+ while (disk_l && disk_l->data) {
+ char *disk_info = disk_l->data;
+ const char *path = NULL;
+ char *devname = NULL;
+ path = extract_devname(path, &devname, devcount++);
+ char *format = NULL;
+ QDict *options = qdict_new();
+
+ if (try_parse_option(&disk_info, "format", &format, disk_info)) {
+ qdict_put_str(options, "driver", format);
+ } else {
+ if (expect_format) {
+ unlink(archivename);
+ g_error("No format specified for device: '%s'", disk_info);
+ } else {
+ g_warning("Specifying a device without a format is deprecated"
+ " - use '-d format=<format>:%s'",
+ disk_info);
+ }
+ }
+
+ path = extract_devname(disk_info, &devname, devcount++);
+
+ Error *errp = NULL;
+ BlockBackend *target;
+
+ target = blk_new_open(path, NULL, NULL, 0, &errp);
+ target = blk_new_open(path, NULL, options, 0, &errp);
+ if (!target) {
+ unlink(archivename);
+ g_error("bdrv_open '%s' failed - %s", path, error_get_pretty(errp));
@@ -2482,6 +2520,8 @@ index 0000000000..bb715e9061
+ // Don't enter coroutine yet, because it might write the header before
+ // all streams can be registered.
+ backup_coroutines = g_list_append(backup_coroutines, co);
+
+ disk_l = g_list_next(disk_l);
+ }
+
+ VmaStatus vmastat;
@@ -2563,6 +2603,7 @@ index 0000000000..bb715e9061
+
+ g_list_free(backup_coroutines);
+ g_list_free(config_files);
+ g_list_free(disk_infos);
+ vma_writer_destroy(vmaw);
+ return 0;
+}

View File

@@ -120,10 +120,10 @@ index e99914eaa4..6bba803f94 100644
system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(files('block-ram-registrar.c'))
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index d954bec6f1..5000c084c5 100644
index bdf2eb50b6..439a7a14c8 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1008,3 +1008,42 @@ void hmp_change_medium(Monitor *mon, const char *device, const char *target,
@@ -1009,3 +1009,42 @@ void hmp_change_medium(Monitor *mon, const char *device, const char *target,
qmp_blockdev_change_medium(device, NULL, target, arg, true, force,
!!read_only, read_only_mode, errp);
}

View File

@@ -68,10 +68,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
3 files changed, 142 insertions(+), 4 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 5000c084c5..70b3de4c7e 100644
index 439a7a14c8..d0e7771dcc 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1043,6 +1043,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
@@ -1044,6 +1044,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
NULL, NULL,
devlist, qdict_haskey(qdict, "speed"), speed,
false, 0, // BackupPerf max-workers

44
debian/patches/series vendored
View File

@@ -3,18 +3,37 @@ extra/0002-scsi-megasas-Internal-cdbs-have-16-byte-length.patch
extra/0003-ide-avoid-potential-deadlock-when-draining-during-tr.patch
extra/0004-Revert-x86-acpi-workaround-Windows-not-handling-name.patch
extra/0005-block-copy-before-write-use-uint64_t-for-timeout-in-.patch
extra/0006-Revert-virtio-pci-fix-use-of-a-released-vector.patch
extra/0007-block-copy-before-write-fix-permission.patch
extra/0008-block-copy-before-write-support-unligned-snapshot-di.patch
extra/0009-block-copy-before-write-create-block_copy-bitmap-in-.patch
extra/0010-qapi-blockdev-backup-add-discard-source-parameter.patch
extra/0011-hw-virtio-Fix-the-de-initialization-of-vhost-user-de.patch
extra/0012-target-arm-Use-float_status-copy-in-sme_fmopa_s.patch
extra/0013-target-arm-Use-FPST_F16-for-SME-FMOPA-widening.patch
extra/0014-scsi-fix-regression-and-honor-bootindex-again-for-le.patch
extra/0015-hw-scsi-lsi53c895a-bump-instruction-limit-in-scripts.patch
extra/0016-block-copy-Fix-missing-graph-lock.patch
extra/0017-Revert-qemu-char-do-not-operate-on-sources-from-fina.patch
extra/0006-block-copy-before-write-fix-permission.patch
extra/0007-block-copy-before-write-support-unligned-snapshot-di.patch
extra/0008-block-copy-before-write-create-block_copy-bitmap-in-.patch
extra/0009-qapi-blockdev-backup-add-discard-source-parameter.patch
extra/0010-hw-virtio-Fix-the-de-initialization-of-vhost-user-de.patch
extra/0011-target-arm-Use-float_status-copy-in-sme_fmopa_s.patch
extra/0012-target-arm-Use-FPST_F16-for-SME-FMOPA-widening.patch
extra/0013-scsi-fix-regression-and-honor-bootindex-again-for-le.patch
extra/0014-hw-scsi-lsi53c895a-bump-instruction-limit-in-scripts.patch
extra/0015-block-copy-Fix-missing-graph-lock.patch
extra/0016-Revert-qemu-char-do-not-operate-on-sources-from-fina.patch
extra/0017-virtio-pci-Fix-the-use-of-an-uninitialized-irqfd.patch
extra/0018-virtio-net-Ensure-queue-index-fits-with-RSS.patch
extra/0019-virtio-net-Fix-network-stall-at-the-host-side-waitin.patch
extra/0020-net-Reinstate-net-nic-model-help-output-as-documente.patch
extra/0021-net-Fix-net-nic-model-for-non-help-arguments.patch
extra/0022-target-arm-Don-t-assert-for-128-bit-tile-accesses-wh.patch
extra/0023-target-arm-Fix-UMOPA-UMOPS-of-16-bit-values.patch
extra/0024-target-arm-Avoid-shifts-by-1-in-tszimm_shr-and-tszim.patch
extra/0025-target-arm-Ignore-SMCR_EL2.LEN-and-SVCR_EL2.LEN-if-E.patch
extra/0026-target-arm-Handle-denormals-correctly-for-FMOPA-wide.patch
extra/0027-intel_iommu-fix-FRCD-construction-macro.patch
extra/0028-target-i386-Do-not-apply-REX-to-MMX-operands.patch
extra/0029-module-Prevent-crash-by-resetting-local_err-in-modul.patch
extra/0030-nbd-server-Plumb-in-new-args-to-nbd_client_add.patch
extra/0031-nbd-server-CVE-2024-7409-Cap-default-max-connections.patch
extra/0032-nbd-server-CVE-2024-7409-Drop-non-negotiating-client.patch
extra/0033-nbd-server-CVE-2024-7409-Close-stray-clients-at-serv.patch
extra/0034-vnc-fix-crash-when-no-console-attached.patch
extra/0035-nbd-server-CVE-2024-7409-Avoid-use-after-free-when-c.patch
extra/0036-softmmu-physmem-fix-memory-leak-in-dirty_memory_exte.patch
bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
@@ -68,4 +87,3 @@ pve/0044-copy-before-write-allow-specifying-minimum-cluster-s.patch
pve/0045-backup-add-minimum-cluster-size-to-performance-optio.patch
pve/0046-PVE-backup-add-fleecing-option.patch
pve/0047-PVE-backup-improve-error-when-copy-before-write-fail.patch
pve-qemu-9.0-vitastor.patch

2
qemu

Submodule qemu updated: c25df57ae8...5ebde3b5c0