2022-01-29 23:43:22 +03:00
|
|
|
[Documentation](../../README.md#documentation) → Usage → QEMU and qemu-img
|
|
|
|
|
|
|
|
-----
|
|
|
|
|
|
|
|
[Читать на русском](qemu.ru.md)
|
|
|
|
|
|
|
|
# QEMU and qemu-img
|
|
|
|
|
|
|
|
## QEMU
|
|
|
|
|
|
|
|
You need patched QEMU version to use Vitastor driver. Pre-built [packages](../installation/packages.en.md) are available.
|
|
|
|
|
|
|
|
To start a VM using plain QEMU command-line with Vitastor disk, use the following commands:
|
|
|
|
|
|
|
|
Old syntax (-drive):
|
|
|
|
|
|
|
|
```
|
|
|
|
qemu-system-x86_64 -enable-kvm -m 1024 \
|
2023-12-31 02:50:28 +03:00
|
|
|
-drive 'file=vitastor:image=debian9',
|
2022-01-29 23:43:22 +03:00
|
|
|
format=raw,if=none,id=drive-virtio-disk0,cache=none \
|
|
|
|
-device 'virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,
|
|
|
|
id=virtio-disk0,bootindex=1,write-cache=off' \
|
|
|
|
-vnc 0.0.0.0:0
|
|
|
|
```
|
|
|
|
|
2023-12-31 02:50:28 +03:00
|
|
|
Etcd address may be specified explicitly by adding `:etcd_host=192.168.7.2\:2379/v3` to `file=`.
|
|
|
|
Configuration file path may be overriden by adding `:config_path=/etc/vitastor/vitastor.conf`.
|
|
|
|
|
2022-01-29 23:43:22 +03:00
|
|
|
New syntax (-blockdev):
|
|
|
|
|
|
|
|
```
|
|
|
|
qemu-system-x86_64 -enable-kvm -m 1024 \
|
|
|
|
-blockdev '{"node-name":"drive-virtio-disk0","driver":"vitastor","image":"debian9",
|
|
|
|
"cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
|
|
|
|
-device 'virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,
|
|
|
|
id=virtio-disk0,bootindex=1,write-cache=off' \
|
|
|
|
-vnc 0.0.0.0:0
|
|
|
|
```
|
|
|
|
|
2023-09-16 18:40:52 +03:00
|
|
|
With a separate I/O thread:
|
|
|
|
|
|
|
|
```
|
|
|
|
qemu-system-x86_64 -enable-kvm -m 1024 \
|
|
|
|
-object iothread,id=vitastor1 \
|
|
|
|
-blockdev '{"node-name":"drive-virtio-disk0","driver":"vitastor","image":"debian9",
|
|
|
|
"cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
|
|
|
|
-device 'virtio-blk-pci,iothread=vitastor1,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,
|
|
|
|
id=virtio-disk0,bootindex=1,write-cache=off' \
|
|
|
|
-vnc 0.0.0.0:0
|
|
|
|
```
|
|
|
|
|
|
|
|
You can also specify inode ID, pool and size manually instead of `:image=<IMAGE>` option: `:pool=<POOL>:inode=<INODE>:size=<SIZE>`.
|
|
|
|
|
2022-01-29 23:43:22 +03:00
|
|
|
## qemu-img
|
|
|
|
|
2023-12-31 02:50:28 +03:00
|
|
|
For qemu-img, you should use `vitastor:image=<IMAGE>[:etcd_host=<HOST>]` as filename.
|
2022-01-29 23:43:22 +03:00
|
|
|
|
|
|
|
For example, to upload a VM image into Vitastor, run:
|
|
|
|
|
|
|
|
```
|
2023-12-31 02:50:28 +03:00
|
|
|
qemu-img convert -f qcow2 debian10.qcow2 -p -O raw 'vitastor:image=debian10'
|
2022-01-29 23:43:22 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
You can also specify `:pool=<POOL>:inode=<INODE>:size=<SIZE>` instead of `:image=<IMAGE>`
|
|
|
|
if you don't want to use inode metadata.
|
2023-01-06 17:33:49 +03:00
|
|
|
|
|
|
|
### Exporting snapshots
|
|
|
|
|
|
|
|
Starting with 0.8.4, you can also export individual layers (snapshot diffs) using `qemu-img`.
|
|
|
|
|
|
|
|
Suppose you have an image `testimg` and a snapshot `testimg@0` created with `vitastor-cli snap-create testimg@0`.
|
|
|
|
|
|
|
|
Then you can export the `testimg@0` snapshot and the data written to `testimg` after creating
|
|
|
|
the snapshot separately using the following commands (key points are using `skip-parents=1` and
|
|
|
|
`-B backing_file` option):
|
|
|
|
|
|
|
|
```
|
2023-12-31 02:50:28 +03:00
|
|
|
qemu-img convert -f raw 'vitastor:image=testimg@0' \
|
2023-01-06 17:33:49 +03:00
|
|
|
-O qcow2 testimg_0.qcow2
|
|
|
|
|
2023-12-31 02:50:28 +03:00
|
|
|
qemu-img convert -f raw 'vitastor:image=testimg:skip-parents=1' \
|
2023-01-06 17:33:49 +03:00
|
|
|
-O qcow2 -o 'cluster_size=4k' -B testimg_0.qcow2 testimg.qcow2
|
|
|
|
```
|
|
|
|
|
|
|
|
In fact, with `cluster_size=4k` any QCOW2 file can be used instead `-B testimg_0.qcow2`, even an empty one.
|
|
|
|
|
|
|
|
QCOW2 `cluster_size=4k` option is required if you want `testimg.qcow2` to contain only the data
|
|
|
|
overwritten **exactly** in the child layer. With the default 64 KB QCOW2 cluster size you'll
|
|
|
|
get a bit of extra data from parent layers, i.e. a 4 KB overwrite will result in `testimg.qcow2`
|
|
|
|
containing 64 KB of data. And this extra data will be taken by `qemu-img` from the file passed
|
|
|
|
in `-B` option, so you really need 4 KB cluster if you use an empty image in `-B`.
|
|
|
|
|
|
|
|
After this procedure you'll get two chained QCOW2 images. To detach `testimg.qcow2` from
|
|
|
|
its parent, run:
|
|
|
|
|
|
|
|
```
|
|
|
|
qemu-img rebase -u -b '' testimg.qcow2
|
|
|
|
```
|
|
|
|
|
|
|
|
This can be used for backups. Just note that exporting an image that is currently being written to
|
|
|
|
is of course unsafe and doesn't produce a consistent result, so only export snapshots if you do this
|
|
|
|
on a live VM.
|
2023-07-04 00:57:34 +03:00
|
|
|
|
2023-09-16 18:40:52 +03:00
|
|
|
## vhost-user-blk
|
|
|
|
|
|
|
|
QEMU, starting with 6.0, includes support for attaching disks via a separate
|
|
|
|
userspace worker process, called `vhost-user-blk`. It usually has slightly (20-30 us)
|
|
|
|
lower latency.
|
|
|
|
|
|
|
|
Example commands to use it with Vitastor:
|
|
|
|
|
|
|
|
```
|
|
|
|
qemu-storage-daemon \
|
|
|
|
--daemonize \
|
|
|
|
--blockdev '{"node-name":"drive-virtio-disk1","driver":"vitastor","image":"testosd1","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
|
|
|
|
--export type=vhost-user-blk,id=vitastor1,node-name=drive-virtio-disk1,addr.type=unix,addr.path=/run/vitastor1-user-blk.sock,writable=on,num-queues=1
|
|
|
|
|
|
|
|
qemu-system-x86_64 -enable-kvm -m 2048 -M accel=kvm,memory-backend=mem \
|
|
|
|
-object memory-backend-memfd,id=mem,size=2G,share=on \
|
|
|
|
-chardev socket,id=vitastor1,reconnect=1,path=/run/vitastor1-user-blk.sock \
|
|
|
|
-device vhost-user-blk-pci,chardev=vitastor1,num-queues=1,config-wce=off \
|
|
|
|
-vnc 0.0.0.0:0
|
|
|
|
```
|
|
|
|
|
|
|
|
memfd memory-backend is crucial, vhost-user-blk does not work without it.
|
|
|
|
|
2023-07-04 00:57:34 +03:00
|
|
|
## VDUSE
|
|
|
|
|
|
|
|
Linux kernel, starting with version 5.15, supports a new interface for attaching virtual disks
|
|
|
|
to the host - VDUSE (vDPA Device in Userspace). QEMU, starting with 7.2, has support for
|
|
|
|
exporting QEMU block devices over this protocol using qemu-storage-daemon.
|
|
|
|
|
2023-11-25 02:25:56 +03:00
|
|
|
VDUSE is currently the best interface to attach Vitastor disks as kernel devices because:
|
|
|
|
- It avoids data copies and thus achieves much better performance than [NBD](nbd.en.md)
|
|
|
|
- It doesn't have NBD timeout problem - the device doesn't die if an operation executes for too long
|
|
|
|
- It doesn't have hung device problem - if the userspace process dies it can be restarted (!)
|
|
|
|
and block device will continue operation
|
|
|
|
- It doesn't seem to have the device number limit
|
|
|
|
|
|
|
|
Example performance comparison:
|
|
|
|
|
|
|
|
| | direct fio | NBD | VDUSE |
|
|
|
|
|----------------------|-------------|-------------|-------------|
|
|
|
|
| linear write | 3.85 GB/s | 1.12 GB/s | 3.85 GB/s |
|
|
|
|
| 4k random write Q128 | 240000 iops | 120000 iops | 178000 iops |
|
|
|
|
| 4k random write Q1 | 9500 iops | 7620 iops | 7640 iops |
|
|
|
|
| linear read | 4.3 GB/s | 1.8 GB/s | 2.85 GB/s |
|
|
|
|
| 4k random read Q128 | 287000 iops | 140000 iops | 189000 iops |
|
|
|
|
| 4k random read Q1 | 9600 iops | 7640 iops | 7780 iops |
|
2023-07-04 00:57:34 +03:00
|
|
|
|
|
|
|
To try VDUSE you need at least Linux 5.15, built with VDUSE support
|
2023-12-06 02:08:59 +03:00
|
|
|
(CONFIG_VDPA=m, CONFIG_VDPA_USER=m, CONFIG_VIRTIO_VDPA=m).
|
2023-11-25 02:25:56 +03:00
|
|
|
|
|
|
|
Debian Linux kernels have these options disabled by now, so if you want to try it on Debian,
|
|
|
|
use a kernel from Ubuntu [kernel-ppa/mainline](https://kernel.ubuntu.com/~kernel-ppa/mainline/), Proxmox,
|
|
|
|
or build modules for Debian kernel manually:
|
|
|
|
|
|
|
|
```
|
|
|
|
mkdir build
|
|
|
|
cd build
|
|
|
|
apt-get install linux-headers-`uname -r`
|
|
|
|
apt-get build-dep linux-image-`uname -r`-unsigned
|
|
|
|
apt-get source linux-image-`uname -r`-unsigned
|
|
|
|
cd linux*/drivers/vdpa
|
|
|
|
make -C /lib/modules/`uname -r`/build M=$PWD CONFIG_VDPA=m CONFIG_VDPA_USER=m CONFIG_VIRTIO_VDPA=m -j8 modules modules_install
|
|
|
|
cat Module.symvers >> /lib/modules/`uname -r`/build/Module.symvers
|
|
|
|
cd ../virtio
|
|
|
|
make -C /lib/modules/`uname -r`/build M=$PWD CONFIG_VDPA=m CONFIG_VDPA_USER=m CONFIG_VIRTIO_VDPA=m -j8 modules modules_install
|
|
|
|
depmod -a
|
|
|
|
```
|
|
|
|
|
|
|
|
You also need `vdpa` tool from the `iproute2` package.
|
2023-07-04 00:57:34 +03:00
|
|
|
|
|
|
|
Commands to attach Vitastor image as a VDUSE device:
|
|
|
|
|
|
|
|
```
|
2023-07-27 23:57:02 +03:00
|
|
|
modprobe vduse
|
|
|
|
modprobe virtio-vdpa
|
2023-07-04 00:57:34 +03:00
|
|
|
qemu-storage-daemon --daemonize --blockdev '{"node-name":"test1","driver":"vitastor",\
|
|
|
|
"etcd-host":"192.168.7.2:2379/v3","image":"testosd1","cache":{"direct":true,"no-flush":false},"discard":"unmap"}' \
|
|
|
|
--export vduse-blk,id=test1,node-name=test1,name=test1,num-queues=16,queue-size=128,writable=true
|
|
|
|
vdpa dev add name test1 mgmtdev vduse
|
|
|
|
```
|
|
|
|
|
2023-11-25 02:25:56 +03:00
|
|
|
After running these commands, `/dev/vda` device will appear in the system and you'll be able to
|
2023-07-04 00:57:34 +03:00
|
|
|
use it as a normal disk.
|
|
|
|
|
|
|
|
To remove the device:
|
|
|
|
|
|
|
|
```
|
|
|
|
vdpa dev del test1
|
|
|
|
kill <qemu-storage-daemon_process_PID>
|
|
|
|
```
|