Compare commits

...

3 Commits

Author SHA1 Message Date
Vitaliy Filippov 2f5959e3fa Add pve-qemu 9.1 patch
Test / test_rebalance_verify_ec (push) Successful in 1m48s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 1m47s Details
Test / test_write_no_same (push) Successful in 9s Details
Test / test_switch_primary (push) Successful in 37s Details
Test / test_write (push) Successful in 42s Details
Test / test_write_xor (push) Successful in 42s Details
Test / test_heal_pg_size_2 (push) Successful in 2m26s Details
Test / test_heal_ec (push) Successful in 2m22s Details
Test / test_heal_antietcd (push) Successful in 2m23s Details
Test / test_heal_csum_32k_dmj (push) Successful in 2m26s Details
Test / test_heal_csum_32k (push) Successful in 2m19s Details
Test / test_heal_csum_32k_dj (push) Successful in 2m33s Details
Test / test_resize (push) Successful in 17s Details
Test / test_heal_csum_4k_dmj (push) Successful in 2m35s Details
Test / test_heal_csum_4k_dj (push) Successful in 2m29s Details
Test / test_resize_auto (push) Successful in 10s Details
Test / test_osd_tags (push) Successful in 11s Details
Test / test_snapshot_pool2 (push) Successful in 16s Details
Test / test_enospc (push) Successful in 12s Details
Test / test_enospc_imm (push) Successful in 18s Details
Test / test_enospc_xor (push) Successful in 23s Details
Test / test_enospc_imm_xor (push) Successful in 23s Details
Test / test_scrub_zero_osd_2 (push) Successful in 16s Details
Test / test_scrub (push) Successful in 19s Details
Test / test_scrub_xor (push) Successful in 18s Details
Test / test_scrub_pg_size_3 (push) Successful in 18s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 19s Details
Test / test_scrub_ec (push) Successful in 19s Details
Test / test_nfs (push) Successful in 15s Details
Test / test_heal_csum_4k (push) Successful in 2m23s Details
2024-12-19 14:05:12 +03:00
Vitaliy Filippov a4a286ed95 Document NFS-RDMA 2024-12-19 14:05:12 +03:00
Vitaliy Filippov b8009bad5e Add librdmacm-dev to build dockerfile 2024-12-19 14:05:12 +03:00
8 changed files with 230 additions and 16 deletions

View File

@ -21,10 +21,10 @@ RUN set -e -x; \
echo 'APT::Install-Recommends false;' >> /etc/apt/apt.conf; \
echo 'APT::Install-Suggests false;' >> /etc/apt/apt.conf
RUN apt-get update
RUN apt-get -y install fio liburing-dev libgoogle-perftools-dev devscripts libjerasure-dev cmake libibverbs-dev libisal-dev libnl-3-dev libnl-genl-3-dev curl
RUN apt-get -y build-dep fio
RUN apt-get --download-only source fio
RUN apt-get update && \
apt-get -y install fio liburing-dev libgoogle-perftools-dev devscripts libjerasure-dev cmake libibverbs-dev librdmacm-dev libisal-dev libnl-3-dev libnl-genl-3-dev curl && \
apt-get -y build-dep fio && \
apt-get --download-only source fio
ADD . /root/vitastor
RUN set -e -x; \

View File

@ -36,6 +36,7 @@
- [Clustered file system](../usage/nfs.en.md#vitastorfs)
- [Experimental internal etcd replacement - antietcd](../config/monitor.en.md#use_antietcd)
- [Built-in Prometheus metric exporter](../config/monitor.en.md#enable_prometheus)
- [NFS RDMA support](../usage/nfs.en.md#rdma) (probably also usable for GPUDirect)
## Plugins and tools

View File

@ -38,6 +38,7 @@
- [Кластерная файловая система](../usage/nfs.ru.md#vitastorfs)
- [Экспериментальная встроенная замена etcd - antietcd](../config/monitor.ru.md#use_antietcd)
- [Встроенный Prometheus-экспортер метрик](../config/monitor.ru.md#enable_prometheus)
- [Поддержка NFS RDMA](../usage/nfs.ru.md#rdma) (вероятно, также подходящая для GPUDirect)
## Драйверы и инструменты

View File

@ -111,6 +111,21 @@ settings, because Vitastor NFS proxy doesn't keep uncommitted data in memory
with these settings. But it may even work without `immediate_commit=all` because
the Linux NFS client repeats all uncommitted writes if it loses the connection.
## RDMA
vitastor-nfs supports NFS over RDMA, which, in theory, should also allow to use
VitastorFS from GPUDirect.
You can test NFS-RDMA even if you don't have an RDMA NIC using SoftROCE:
1. First, add SoftROCE device on both servers: `rdma link add rxe0 type rxe netdev eth0`.
Here, `rdma` utility is a part the iproute2 package, and `eth0` should be replaced with
the name of your Ethernet NIC.
2. Start vitastor-nfs with RDMA: `vitastor-nfs start (--fs <NAME> | --block) --pool <POOL> --port 20049 --nfs_rdma 20049 --portmap 0`
3. Mount the FS: `mount 192.168.0.10:/mnt/test/ /mnt/vita/ -o port=20049,mountport=20049,nfsvers=3,soft,nolock,rdma`
## Commands
### mount
@ -131,11 +146,16 @@ The server will be automatically stopped when the FS is unmounted.
Start network NFS server. Options:
| <!-- --> | <!-- --> |
|-----------------|------------------------------------------------------------|
| `--bind <IP>` | bind service to \<IP> address (default 0.0.0.0) |
| `--port <PORT>` | use port \<PORT> for NFS services (default is 2049) |
| `--portmap 0` | do not listen on port 111 (portmap/rpcbind, requires root) |
| <!-- --> | <!-- --> |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| `--bind <IP>` | bind service to \<IP> address (default 0.0.0.0) |
| `--port <PORT>` | use port \<PORT> for NFS services (default is 2049). Specify "auto" to auto-select and print port |
| `--portmap 0` | do not listen on port 111 (portmap/rpcbind, requires root) |
| `--nfs_rdma <PORT>` | enable NFS-RDMA at RDMA-CM port \<PORT> (you can try 20049). If RDMA is enabled and --port is set to 0, TCP will be disabled |
| `--nfs_rdma_credit 16` | maximum operation credit for RDMA clients (max iodepth) |
| `--nfs_rdma_send 1024` | maximum RDMA send operation count (should be larger than iodepth) |
| `--nfs_rdma_alloc 1M` | RDMA memory allocation rounding |
| `--nfs_rdma_gc 64M` | maximum unused RDMA buffers |
### upgrade

View File

@ -116,6 +116,21 @@ JSON-формате :-). Для инспекции содержимого БД
даже без `immediate_commit=all`, потому что NFS-клиент ядра Linux повторяет все
незафиксированные запросы при потере соединения.
## RDMA
vitastor-nfs поддерживает NFS через RDMA. В теории это также должно позволять использовать
VitastorFS из GPUDirect.
Вы можете протестировать NFS-RDMA, даже если у вас нет RDMA-карты, с помощью SoftROCE:
1. Сначала создайте SoftROCE устройства на обоих тестовых серверах: `rdma link add rxe0 type rxe netdev eth0`.
Утилита `rdma` входит в состав пакета iproute2, а `eth0` вам нужно заменить на имя своей
сетевой карты.
2. Запустите vitastor-nfs с RDMA: `vitastor-nfs start (--fs <NAME> | --block) --pool <POOL> --port 20049 --nfs_rdma 20049 --portmap 0`
3. Смонтируйте ФС: `mount 192.168.0.10:/mnt/test/ /mnt/vita/ -o port=20049,mountport=20049,nfsvers=3,soft,nolock,rdma`
## Команды
### mount
@ -136,11 +151,16 @@ JSON-формате :-). Для инспекции содержимого БД
Запустить сетевой NFS-сервер. Опции:
| <!-- --> | <!-- --> |
|-----------------|-----------------------------------------------------------------------|
| `--bind <IP>` | принимать соединения по адресу \<IP> (по умолчанию 0.0.0.0 - на всех) |
| `--port <PORT>` | использовать порт \<PORT> для NFS-сервисов (по умолчанию 2049) |
| `--portmap 0` | отключить сервис portmap/rpcbind на порту 111 (по умолчанию включён и требует root привилегий) |
| <!-- --> | <!-- --> |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| `--bind <IP>` | принимать соединения по адресу \<IP> (по умолчанию 0.0.0.0 - на всех) |
| `--port <PORT>` | использовать порт \<PORT> для NFS-сервисов (по умолчанию 2049). Укажите "auto", чтобы выбрать и напечатать случайный порт |
| `--portmap 0` | отключить сервис portmap/rpcbind на порту 111 (по умолчанию включён и требует root привилегий) |
| `--nfs_rdma <PORT>` | включить NFS-RDMA на порту RDMA-CM \<PORT> (попробуйте 20049). Если RDMA включено и указано `--port 0`, TCP будет отключено |
| `--nfs_rdma_credit 16` | максимальный "кредит", глубина очереди для NFS-клиентов |
| `--nfs_rdma_send 1024` | максимальное число операций RDMA отправки (должно быть больше nfs_rdma_credit) |
| `--nfs_rdma_alloc 1M` | округление выделения памяти для RDMA-клиентов |
| `--nfs_rdma_gc 64M` | максимальный объём неиспользуемой памяти RDMA-клиентом перед освобождением |
### upgrade

View File

@ -0,0 +1,172 @@
Index: pve-qemu-kvm-9.1.2/block/meson.build
===================================================================
--- pve-qemu-kvm-9.1.2.orig/block/meson.build
+++ pve-qemu-kvm-9.1.2/block/meson.build
@@ -126,6 +126,7 @@ foreach m : [
[libnfs, 'nfs', files('nfs.c')],
[libssh, 'ssh', files('ssh.c')],
[rbd, 'rbd', files('rbd.c')],
+ [vitastor, 'vitastor', files('vitastor.c')],
]
if m[0].found()
module_ss = ss.source_set()
Index: pve-qemu-kvm-9.1.2/meson.build
===================================================================
--- pve-qemu-kvm-9.1.2.orig/meson.build
+++ pve-qemu-kvm-9.1.2/meson.build
@@ -1516,6 +1516,26 @@ if not get_option('rbd').auto() or have_
endif
endif
+vitastor = not_found
+if not get_option('vitastor').auto() or have_block
+ libvitastor_client = cc.find_library('vitastor_client', has_headers: ['vitastor_c.h'],
+ required: get_option('vitastor'))
+ if libvitastor_client.found()
+ if cc.links('''
+ #include <vitastor_c.h>
+ int main(void) {
+ vitastor_c_create_qemu(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
+ return 0;
+ }''', dependencies: libvitastor_client)
+ vitastor = declare_dependency(dependencies: libvitastor_client)
+ elif get_option('vitastor').enabled()
+ error('could not link libvitastor_client')
+ else
+ warning('could not link libvitastor_client, disabling')
+ endif
+ endif
+endif
+
glusterfs = not_found
glusterfs_ftruncate_has_stat = false
glusterfs_iocb_has_stat = false
@@ -2367,6 +2387,7 @@ endif
config_host_data.set('CONFIG_OPENGL', opengl.found())
config_host_data.set('CONFIG_PLUGIN', get_option('plugins'))
config_host_data.set('CONFIG_RBD', rbd.found())
+config_host_data.set('CONFIG_VITASTOR', vitastor.found())
config_host_data.set('CONFIG_RDMA', rdma.found())
config_host_data.set('CONFIG_RELOCATABLE', get_option('relocatable'))
config_host_data.set('CONFIG_SAFESTACK', get_option('safe_stack'))
@@ -4534,6 +4555,7 @@ summary_info += {'fdt support': fd
summary_info += {'libcap-ng support': libcap_ng}
summary_info += {'bpf support': libbpf}
summary_info += {'rbd support': rbd}
+summary_info += {'vitastor support': vitastor}
summary_info += {'smartcard support': cacard}
summary_info += {'U2F support': u2f}
summary_info += {'libusb': libusb}
Index: pve-qemu-kvm-9.1.2/meson_options.txt
===================================================================
--- pve-qemu-kvm-9.1.2.orig/meson_options.txt
+++ pve-qemu-kvm-9.1.2/meson_options.txt
@@ -194,6 +194,8 @@ option('lzo', type : 'feature', value :
description: 'lzo compression support')
option('rbd', type : 'feature', value : 'auto',
description: 'Ceph block device driver')
+option('vitastor', type : 'feature', value : 'auto',
+ description: 'Vitastor block device driver')
option('opengl', type : 'feature', value : 'auto',
description: 'OpenGL support')
option('rdma', type : 'feature', value : 'auto',
Index: pve-qemu-kvm-9.1.2/qapi/block-core.json
===================================================================
--- pve-qemu-kvm-9.1.2.orig/qapi/block-core.json
+++ pve-qemu-kvm-9.1.2/qapi/block-core.json
@@ -3477,7 +3477,7 @@
'raw', 'rbd',
{ 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
'pbs',
- 'ssh', 'throttle', 'vdi', 'vhdx',
+ 'ssh', 'throttle', 'vdi', 'vhdx', 'vitastor',
{ 'name': 'virtio-blk-vfio-pci', 'if': 'CONFIG_BLKIO' },
{ 'name': 'virtio-blk-vhost-user', 'if': 'CONFIG_BLKIO' },
{ 'name': 'virtio-blk-vhost-vdpa', 'if': 'CONFIG_BLKIO' },
@@ -4588,6 +4588,28 @@
'*server': ['InetSocketAddressBase'] } }
##
+# @BlockdevOptionsVitastor:
+#
+# Driver specific block device options for vitastor
+#
+# @image: Image name
+# @inode: Inode number
+# @pool: Pool ID
+# @size: Desired image size in bytes
+# @config-path: Path to Vitastor configuration
+# @etcd-host: etcd connection address(es)
+# @etcd-prefix: etcd key/value prefix
+##
+{ 'struct': 'BlockdevOptionsVitastor',
+ 'data': { '*inode': 'uint64',
+ '*pool': 'uint64',
+ '*size': 'uint64',
+ '*image': 'str',
+ '*config-path': 'str',
+ '*etcd-host': 'str',
+ '*etcd-prefix': 'str' } }
+
+##
# @ReplicationMode:
#
# An enumeration of replication modes.
@@ -5050,6 +5072,7 @@
'throttle': 'BlockdevOptionsThrottle',
'vdi': 'BlockdevOptionsGenericFormat',
'vhdx': 'BlockdevOptionsGenericFormat',
+ 'vitastor': 'BlockdevOptionsVitastor',
'virtio-blk-vfio-pci':
{ 'type': 'BlockdevOptionsVirtioBlkVfioPci',
'if': 'CONFIG_BLKIO' },
@@ -5497,6 +5520,20 @@
'*encrypt' : 'RbdEncryptionCreateOptions' } }
##
+# @BlockdevCreateOptionsVitastor:
+#
+# Driver specific image creation options for Vitastor.
+#
+# @location: Where to store the new image file. This location cannot
+# point to a snapshot.
+#
+# @size: Size of the virtual disk in bytes
+##
+{ 'struct': 'BlockdevCreateOptionsVitastor',
+ 'data': { 'location': 'BlockdevOptionsVitastor',
+ 'size': 'size' } }
+
+##
# @BlockdevVmdkSubformat:
#
# Subformat options for VMDK images
@@ -5718,6 +5755,7 @@
'ssh': 'BlockdevCreateOptionsSsh',
'vdi': 'BlockdevCreateOptionsVdi',
'vhdx': 'BlockdevCreateOptionsVhdx',
+ 'vitastor': 'BlockdevCreateOptionsVitastor',
'vmdk': 'BlockdevCreateOptionsVmdk',
'vpc': 'BlockdevCreateOptionsVpc'
} }
Index: pve-qemu-kvm-9.1.2/scripts/meson-buildoptions.sh
===================================================================
--- pve-qemu-kvm-9.1.2.orig/scripts/meson-buildoptions.sh
+++ pve-qemu-kvm-9.1.2/scripts/meson-buildoptions.sh
@@ -168,6 +168,7 @@ meson_options_help() {
printf "%s\n" ' qga-vss build QGA VSS support (broken with MinGW)'
printf "%s\n" ' qpl Query Processing Library support'
printf "%s\n" ' rbd Ceph block device driver'
+ printf "%s\n" ' vitastor Vitastor block device driver'
printf "%s\n" ' rdma Enable RDMA-based migration'
printf "%s\n" ' replication replication support'
printf "%s\n" ' rutabaga-gfx rutabaga_gfx support'
@@ -444,6 +445,8 @@ _meson_option_parse() {
--disable-qpl) printf "%s" -Dqpl=disabled ;;
--enable-rbd) printf "%s" -Drbd=enabled ;;
--disable-rbd) printf "%s" -Drbd=disabled ;;
+ --enable-vitastor) printf "%s" -Dvitastor=enabled ;;
+ --disable-vitastor) printf "%s" -Dvitastor=disabled ;;
--enable-rdma) printf "%s" -Drdma=enabled ;;
--disable-rdma) printf "%s" -Drdma=disabled ;;
--enable-relocatable) printf "%s" -Drelocatable=true ;;

View File

@ -15,7 +15,7 @@ BuildRequires: rh-nodejs12-npm
BuildRequires: jerasure-devel
BuildRequires: libisa-l-devel
BuildRequires: gf-complete-devel
BuildRequires: libibverbs-devel
BuildRequires: rdma-core-devel
BuildRequires: cmake3
BuildRequires: libnl3-devel
Requires: vitastor-osd = %{version}-%{release}

View File

@ -14,7 +14,7 @@ BuildRequires: nodejs >= 10
BuildRequires: jerasure-devel
BuildRequires: libisa-l-devel
BuildRequires: gf-complete-devel
BuildRequires: libibverbs-devel
BuildRequires: rdma-core-devel
BuildRequires: cmake
BuildRequires: libnl3-devel
Requires: vitastor-osd = %{version}-%{release}