Compare commits

..

1 Commits

Author SHA1 Message Date
Vitaliy Filippov 1c78df4fba Add bindiff for tests
Test / test_switch_primary (push) Successful in 34s Details
Test / test_write (push) Successful in 32s Details
Test / test_write_no_same (push) Successful in 9s Details
Test / test_write_xor (push) Successful in 35s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 1m45s Details
Test / test_heal_pg_size_2 (push) Successful in 2m15s Details
Test / test_heal_ec (push) Successful in 2m17s Details
Test / test_heal_antietcd (push) Successful in 2m18s Details
Test / test_etcd_fail (push) Failing after 10m5s Details
Test / test_heal_csum_32k_dmj (push) Failing after 2m29s Details
Test / test_heal_csum_32k_dj (push) Successful in 2m19s Details
Test / test_heal_csum_32k (push) Successful in 2m16s Details
Test / test_resize (push) Successful in 12s Details
Test / test_resize_auto (push) Successful in 8s Details
Test / test_snapshot_pool2 (push) Successful in 13s Details
Test / test_osd_tags (push) Successful in 7s Details
Test / test_enospc (push) Successful in 11s Details
Test / test_enospc_xor (push) Successful in 12s Details
Test / test_enospc_imm (push) Successful in 10s Details
Test / test_enospc_imm_xor (push) Successful in 13s Details
Test / test_scrub (push) Successful in 13s Details
Test / test_heal_csum_4k_dmj (push) Successful in 2m17s Details
Test / test_heal_csum_4k_dj (push) Successful in 2m17s Details
Test / test_scrub_zero_osd_2 (push) Successful in 13s Details
Test / test_heal_csum_4k (push) Successful in 2m16s Details
Test / test_scrub_xor (push) Successful in 14s Details
Test / test_scrub_pg_size_3 (push) Successful in 17s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 16s Details
Test / test_scrub_ec (push) Successful in 14s Details
Test / test_nfs (push) Successful in 13s Details
2024-11-16 19:35:43 +03:00
12 changed files with 232 additions and 95 deletions

View File

@ -68,15 +68,11 @@ but they are not connected to the cluster.
- Type: string
RDMA device name to use for Vitastor OSD communications (for example,
"rocep5s0f0"). If not specified, Vitastor will try to find an RoCE
device matching [osd_network](osd.en.md#osd_network), preferring RoCEv2,
or choose the first available RDMA device if no RoCE devices are
found or if `osd_network` is not specified.
"rocep5s0f0"). Now Vitastor supports all adapters, even ones without
ODP support, like Mellanox ConnectX-3 and non-Mellanox cards.
Vitastor supports all adapters, even ones without ODP support, like
Mellanox ConnectX-3 and non-Mellanox cards. Versions up to Vitastor
1.2.0 required ODP which is only present in Mellanox ConnectX >= 4.
See also [rdma_odp](#rdma_odp).
Versions up to Vitastor 1.2.0 required ODP which is only present in
Mellanox ConnectX >= 4. See also [rdma_odp](#rdma_odp).
Run `ibv_devinfo -v` as root to list available RDMA devices and their
features.
@ -99,16 +95,15 @@ your device has.
## rdma_gid_index
- Type: integer
- Default: 0
Global address identifier index of the RDMA device to use. Different GID
indexes may correspond to different protocols like RoCEv1, RoCEv2 and iWARP.
Search for "GID" in `ibv_devinfo -v` output to determine which GID index
you need.
If not specified, Vitastor will try to auto-select a RoCEv2 IPv4 GID, then
RoCEv2 IPv6 GID, then RoCEv1 IPv4 GID, then RoCEv1 IPv6 GID, then IB GID.
A correct rdma_gid_index for RoCEv2 is usually 1 (IPv6) or 3 (IPv4).
**IMPORTANT:** If you want to use RoCEv2 (as recommended) then the correct
rdma_gid_index is usually 1 (IPv6) or 3 (IPv4).
## rdma_mtu

View File

@ -71,16 +71,12 @@ RDMA может быть нужно только если у клиентов е
- Тип: строка
Название RDMA-устройства для связи с Vitastor OSD (например, "rocep5s0f0").
Если не указано, Vitastor попробует найти RoCE-устройство, соответствующее
[osd_network](osd.en.md#osd_network), предпочитая RoCEv2, или выбрать первое
попавшееся RDMA-устройство, если RoCE-устройств нет или если сеть `osd_network`
не задана.
Vitastor поддерживает все модели адаптеров, включая те, у которых
Сейчас Vitastor поддерживает все модели адаптеров, включая те, у которых
нет поддержки ODP, то есть вы можете использовать RDMA с ConnectX-3 и
картами производства не Mellanox. Версии Vitastor до 1.2.0 включительно
требовали ODP, который есть только на Mellanox ConnectX 4 и более новых.
См. также [rdma_odp](#rdma_odp).
картами производства не Mellanox.
Версии Vitastor до 1.2.0 включительно требовали ODP, который есть только
на Mellanox ConnectX 4 и более новых. См. также [rdma_odp](#rdma_odp).
Запустите `ibv_devinfo -v` от имени суперпользователя, чтобы посмотреть
список доступных RDMA-устройств, их параметры и возможности.
@ -105,17 +101,15 @@ Control) и ECN (Explicit Congestion Notification).
## rdma_gid_index
- Тип: целое число
- Значение по умолчанию: 0
Номер глобального идентификатора адреса RDMA-устройства, который следует
использовать. Разным gid_index могут соответствовать разные протоколы связи:
RoCEv1, RoCEv2, iWARP. Чтобы понять, какой нужен вам - смотрите строчки со
словом "GID" в выводе команды `ibv_devinfo -v`.
Если не указан, Vitastor попробует автоматически выбрать сначала GID,
соответствующий RoCEv2 IPv4, потом RoCEv2 IPv6, потом RoCEv1 IPv4, потом
RoCEv1 IPv6, потом IB.
Правильный rdma_gid_index для RoCEv2, как правило, 1 (IPv6) или 3 (IPv4).
**ВАЖНО:** Если вы хотите использовать RoCEv2 (как мы и рекомендуем), то
правильный rdma_gid_index, как правило, 1 (IPv6) или 3 (IPv4).
## rdma_mtu

View File

@ -48,15 +48,11 @@
type: string
info: |
RDMA device name to use for Vitastor OSD communications (for example,
"rocep5s0f0"). If not specified, Vitastor will try to find an RoCE
device matching [osd_network](osd.en.md#osd_network), preferring RoCEv2,
or choose the first available RDMA device if no RoCE devices are
found or if `osd_network` is not specified.
"rocep5s0f0"). Now Vitastor supports all adapters, even ones without
ODP support, like Mellanox ConnectX-3 and non-Mellanox cards.
Vitastor supports all adapters, even ones without ODP support, like
Mellanox ConnectX-3 and non-Mellanox cards. Versions up to Vitastor
1.2.0 required ODP which is only present in Mellanox ConnectX >= 4.
See also [rdma_odp](#rdma_odp).
Versions up to Vitastor 1.2.0 required ODP which is only present in
Mellanox ConnectX >= 4. See also [rdma_odp](#rdma_odp).
Run `ibv_devinfo -v` as root to list available RDMA devices and their
features.
@ -68,16 +64,12 @@
PFC (Priority Flow Control) and ECN (Explicit Congestion Notification).
info_ru: |
Название RDMA-устройства для связи с Vitastor OSD (например, "rocep5s0f0").
Если не указано, Vitastor попробует найти RoCE-устройство, соответствующее
[osd_network](osd.en.md#osd_network), предпочитая RoCEv2, или выбрать первое
попавшееся RDMA-устройство, если RoCE-устройств нет или если сеть `osd_network`
не задана.
Vitastor поддерживает все модели адаптеров, включая те, у которых
Сейчас Vitastor поддерживает все модели адаптеров, включая те, у которых
нет поддержки ODP, то есть вы можете использовать RDMA с ConnectX-3 и
картами производства не Mellanox. Версии Vitastor до 1.2.0 включительно
требовали ODP, который есть только на Mellanox ConnectX 4 и более новых.
См. также [rdma_odp](#rdma_odp).
картами производства не Mellanox.
Версии Vitastor до 1.2.0 включительно требовали ODP, который есть только
на Mellanox ConnectX 4 и более новых. См. также [rdma_odp](#rdma_odp).
Запустите `ibv_devinfo -v` от имени суперпользователя, чтобы посмотреть
список доступных RDMA-устройств, их параметры и возможности.
@ -102,27 +94,23 @@
`ibv_devinfo -v`.
- name: rdma_gid_index
type: int
default: 0
info: |
Global address identifier index of the RDMA device to use. Different GID
indexes may correspond to different protocols like RoCEv1, RoCEv2 and iWARP.
Search for "GID" in `ibv_devinfo -v` output to determine which GID index
you need.
If not specified, Vitastor will try to auto-select a RoCEv2 IPv4 GID, then
RoCEv2 IPv6 GID, then RoCEv1 IPv4 GID, then RoCEv1 IPv6 GID, then IB GID.
A correct rdma_gid_index for RoCEv2 is usually 1 (IPv6) or 3 (IPv4).
**IMPORTANT:** If you want to use RoCEv2 (as recommended) then the correct
rdma_gid_index is usually 1 (IPv6) or 3 (IPv4).
info_ru: |
Номер глобального идентификатора адреса RDMA-устройства, который следует
использовать. Разным gid_index могут соответствовать разные протоколы связи:
RoCEv1, RoCEv2, iWARP. Чтобы понять, какой нужен вам - смотрите строчки со
словом "GID" в выводе команды `ibv_devinfo -v`.
Если не указан, Vitastor попробует автоматически выбрать сначала GID,
соответствующий RoCEv2 IPv4, потом RoCEv2 IPv6, потом RoCEv1 IPv4, потом
RoCEv1 IPv6, потом IB.
Правильный rdma_gid_index для RoCEv2, как правило, 1 (IPv6) или 3 (IPv4).
**ВАЖНО:** Если вы хотите использовать RoCEv2 (как мы и рекомендуем), то
правильный rdma_gid_index, как правило, 1 (IPv6) или 3 (IPv4).
- name: rdma_mtu
type: int
default: 4096

View File

@ -232,7 +232,6 @@ class EtcdAdapter
async become_master()
{
const state = { ...this.mon.get_mon_state(), id: ''+this.mon.etcd_lease_id };
console.log('Waiting to become master');
// eslint-disable-next-line no-constant-condition
while (1)
{
@ -244,6 +243,7 @@ class EtcdAdapter
{
break;
}
console.log('Waiting to become master');
await new Promise(ok => setTimeout(ok, this.mon.config.etcd_start_timeout));
}
console.log('Became master');

View File

@ -131,7 +131,6 @@ static matched_dev match_device(ibv_device **dev_list, addr_mask_t *networks, in
ibv_port_attr portinfo;
ibv_gid_entry best_gidx;
int res;
bool have_non_roce = false, have_roce = false;
for (int i = 0; dev_list[i]; ++i)
{
auto dev = dev_list[i];
@ -162,11 +161,6 @@ static matched_dev match_device(ibv_device **dev_list, addr_mask_t *networks, in
else
break;
}
if (gidx.gid_type != IBV_GID_TYPE_ROCE_V1 &&
gidx.gid_type != IBV_GID_TYPE_ROCE_V2)
have_non_roce = true;
else
have_roce = true;
if (match_gid(&gidx, networks, nnet))
{
// Prefer RoCEv2
@ -192,10 +186,6 @@ cleanup:
{
log_rdma_dev_port_gid(dev_list[best.dev], best.port, best.gid, best_gidx);
}
if (best.dev < 0 && have_non_roce && !have_roce)
{
best.dev = -2;
}
return best;
}
@ -247,17 +237,11 @@ msgr_rdma_context_t *msgr_rdma_context_t::create(std::vector<std::string> osd_ne
nets.push_back(cidr_parse(netstr));
}
auto best = match_device(dev_list, nets.data(), nets.size(), log_level);
if (best.dev == -2)
if (best.dev < 0)
{
if (log_level > 0)
fprintf(stderr, "RDMA device matching osd_network is not found, using first available device\n");
best.dev = 0;
if (log_level > 0)
fprintf(stderr, "No RoCE devices found, using first available RDMA device %s\n", ibv_get_device_name(*dev_list));
}
else if (best.dev < 0)
{
if (log_level > 0)
fprintf(stderr, "RDMA device matching osd_network is not found, disabling RDMA\n");
goto cleanup;
}
else
{
@ -317,11 +301,10 @@ msgr_rdma_context_t *msgr_rdma_context_t::create(std::vector<std::string> osd_ne
{
continue;
}
// Prefer IPv4 RoCEv2 -> IPv6 RoCEv2 -> IPv4 RoCEv1 -> IPv6 RoCEv1 -> IB
// Prefer IPv4 RoCEv2 GID by default
if (gid_index == -1 ||
gidx.gid_type == IBV_GID_TYPE_ROCE_V2 && ctx->my_gid.gid_type != IBV_GID_TYPE_ROCE_V2 ||
gidx.gid_type == IBV_GID_TYPE_ROCE_V1 && ctx->my_gid.gid_type == IBV_GID_TYPE_IB ||
gidx.gid_type == ctx->my_gid.gid_type && is_ipv4_gid(&gidx))
gidx.gid_type == IBV_GID_TYPE_ROCE_V2 &&
(ctx->my_gid.gid_type != IBV_GID_TYPE_ROCE_V2 || is_ipv4_gid(&gidx)))
{
gid_index = k;
ctx->my_gid = gidx;

View File

@ -261,7 +261,6 @@ struct dd_out_info_t
else
{
// ok
out_size = owatch->cfg.size;
return true;
}
// Wait for sub-command
@ -883,8 +882,6 @@ resume_2:
oinfo.end_fsync = oinfo.end_fsync && oinfo.out_seekable;
read_offset = 0;
read_end = iinfo.in_seekable ? iinfo.in_size-iseek : 0;
if (oinfo.out_size && (!read_end || read_end > oinfo.out_size-oseek))
read_end = oinfo.out_size-oseek;
if (bytelimit && (!read_end || read_end > bytelimit))
read_end = bytelimit;
clock_gettime(CLOCK_REALTIME, &tv_begin);

View File

@ -142,16 +142,13 @@ resume_2:
auto osd_free = value["free"].uint64_value();
total_raw += osd_size;
free_raw += osd_free;
if (osd_size)
if (!osd_free)
{
if (!osd_free)
{
osds_full++;
}
else if (osd_free < (uint64_t)(osd_size*(1-osd_nearfull_ratio)))
{
osds_nearfull++;
}
osds_full++;
}
else if (osd_free < (uint64_t)(osd_size*(1-osd_nearfull_ratio)))
{
osds_nearfull++;
}
auto peer_it = parent->cli->st_cli.peer_states.find(stat_osd_num);
if (peer_it != parent->cli->st_cli.peer_states.end())

View File

@ -12,6 +12,11 @@ target_link_libraries(stub_bench tcmalloc_minimal)
add_executable(osd_test osd_test.cpp ../util/rw_blocking.cpp ../util/addr_util.cpp)
target_link_libraries(osd_test tcmalloc_minimal)
# bindiff
add_executable(bindiff
bindiff.c
)
# stub_uring_osd
add_executable(stub_uring_osd
stub_uring_osd.cpp

177
src/test/bindiff.c Normal file
View File

@ -0,0 +1,177 @@
// Copyright (c) Vitaliy Filippov, 2004+
// License: VNPL-1.1 (see README.md for details)
#ifndef _LARGEFILE64_SOURCE
#define _LARGEFILE64_SOURCE
#endif
#include <string.h>
#include <sys/stat.h>
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <fcntl.h>
#define BUFSIZE 0x100000
uint64_t filelength(int fd)
{
struct stat st;
if (fstat(fd, &st) < 0)
{
fprintf(stderr, "fstat failed: %s\n", strerror(errno));
return 0;
}
if (st.st_size < 0)
{
return 0;
}
return (uint64_t)st.st_size;
}
size_t read_blocking(int fd, void *read_buf, size_t remaining)
{
size_t done = 0;
while (done < remaining)
{
ssize_t r = read(fd, read_buf, remaining-done);
if (r <= 0)
{
if (!errno)
{
// EOF
return done;
}
else if (errno != EINTR && errno != EAGAIN && errno != EPIPE)
{
perror("read");
exit(1);
}
continue;
}
done += (size_t)r;
read_buf = (uint8_t*)read_buf + r;
}
return done;
}
size_t write_blocking(int fd, void *write_buf, size_t remaining)
{
size_t done = 0;
while (done < remaining)
{
ssize_t r = write(fd, write_buf, remaining-done);
if (r < 0)
{
if (errno != EINTR && errno != EAGAIN && errno != EPIPE)
{
perror("write");
exit(1);
}
continue;
}
done += (size_t)r;
write_buf = (uint8_t*)write_buf + r;
}
return done;
}
int main(int narg, char *args[])
{
int fd1 = -1, fd2 = -1;
uint8_t *buf1 = NULL, *buf2 = NULL;
uint64_t addr = 0, l1 = 0, l2 = 0, l = 0, diffl = 0;
size_t buf1_len = 0, buf2_len = 0, i = 0, j = 0, dl = 0;
int argoff = 0;
int nosource = 0;
fprintf(stderr, "VMX HexDiff v2.1\nLicense: GPLv3.0+, (c) 2005+, Vitaliy Filippov\n");
argoff = 1;
if (narg > argoff && strcmp(args[argoff], "-n") == 0)
{
nosource = 1;
argoff++;
}
if (narg < argoff+2)
{
fprintf(stderr, "USAGE: bindiff [-n] <file1> <file2>\n"
"This will create hex patch file1->file2 and write it to stdout.\n"
"[-n] = do not write file1 data in patch, only file2.\n");
return -1;
}
fd1 = open(args[argoff], O_RDONLY);
if (fd1 < 0)
{
fprintf(stderr, "Couldn't open %s: %s\n", args[argoff], strerror(errno));
return -1;
}
fd2 = open(args[argoff+1], O_RDONLY);
if (fd2 < 0)
{
fprintf(stderr, "Couldn't open %s: %s\n", args[argoff+1], strerror(errno));
close(fd1);
return -1;
}
l1 = filelength(fd1);
l2 = filelength(fd2);
if (l1 < l2)
l = l1;
else
l = l2;
addr = diffl = 0;
buf1 = malloc(BUFSIZE+1);
buf2 = malloc(BUFSIZE+1);
while ((buf1_len = read_blocking(fd1, buf1, BUFSIZE)) > 0 && (buf2_len = read_blocking(fd2, buf2, BUFSIZE)) > 0)
{
buf1[buf1_len] = buf2[buf2_len] = 0;
for (dl = 0, i = 0; i <= buf1_len && i <= buf2_len; i++, addr++)
{
if (buf1[i] != buf2[i])
{
dl++;
}
else if (dl)
{
printf("%08jX: ", addr-dl);
if (!nosource)
{
for (j = i-dl; j < i; j++)
printf("%02X", buf1[j]);
printf(" ");
}
for (j = i-dl; j < i; j++)
printf("%02X", buf2[j]);
printf("\n");
diffl += dl;
dl = 0;
}
}
addr--;
}
if (l1 < l2)
{
printf("%08zX: ", i);
while ((buf2_len = read_blocking(fd2, buf2, BUFSIZE)) > 0)
{
for (j = 0; j < buf2_len; j++, i++)
printf("%02X", buf2[j]);
}
printf("\n");
}
else if (l1 > l2)
{
printf("SIZE %08zX\n", l2);
}
if (diffl != 0 || l1 != l2)
{
fprintf(stderr, "Difference in %zu of %zu common bytes\n", diffl, l);
if (l1 != l2)
fprintf(stderr, "Length difference!\nFile \"%s\": %zu\nFile \"%s\": %zu\n", args [1], l1, args [2], l2);
}
else
{
fprintf(stderr, "Files are equal\n");
}
return 0;
}

View File

@ -10,7 +10,7 @@
#include "rw_blocking.h"
int read_blocking(int fd, void *read_buf, size_t remaining)
size_t read_blocking(int fd, void *read_buf, size_t remaining)
{
size_t done = 0;
while (done < remaining)
@ -30,13 +30,13 @@ int read_blocking(int fd, void *read_buf, size_t remaining)
}
continue;
}
done += r;
done += (size_t)r;
read_buf = (uint8_t*)read_buf + r;
}
return done;
}
int write_blocking(int fd, void *write_buf, size_t remaining)
size_t write_blocking(int fd, void *write_buf, size_t remaining)
{
size_t done = 0;
while (done < remaining)
@ -51,7 +51,7 @@ int write_blocking(int fd, void *write_buf, size_t remaining)
}
continue;
}
done += r;
done += (size_t)r;
write_buf = (uint8_t*)write_buf + r;
}
return done;

View File

@ -6,8 +6,8 @@
#include <unistd.h>
#include <sys/uio.h>
int read_blocking(int fd, void *read_buf, size_t remaining);
int write_blocking(int fd, void *write_buf, size_t remaining);
size_t read_blocking(int fd, void *read_buf, size_t remaining);
size_t write_blocking(int fd, void *write_buf, size_t remaining);
int readv_blocking(int fd, iovec *iov, int iovcnt);
int writev_blocking(int fd, iovec *iov, int iovcnt);
int sendv_blocking(int fd, iovec *iov, int iovcnt, int flags);

View File

@ -53,13 +53,14 @@ kill_osds &
LD_PRELOAD="build/src/client/libfio_vitastor.so" \
fio -thread -name=test -ioengine=build/src/client/libfio_vitastor.so -bsrange=4k-128k -blockalign=4k -direct=1 -iodepth=32 -fsync=256 -rw=randrw \
-serialize_overlap=1 -randrepeat=0 -refill_buffers=1 -mirror_file=./testdata/bin/mirror.bin -etcd=$ETCD_URL -image=testimg -loops=10 -runtime=120
-randrepeat=0 -refill_buffers=1 -mirror_file=./testdata/bin/mirror.bin -etcd=$ETCD_URL -image=testimg -loops=10 -runtime=120
qemu-img convert -S 4096 -p \
-f raw "vitastor:etcd_host=127.0.0.1\:$ETCD_PORT/v3:image=testimg" \
-O raw ./testdata/bin/read.bin
if ! diff -q ./testdata/bin/read.bin ./testdata/bin/mirror.bin; then
build/src/test/bindiff ./testdata/bin/read.bin ./testdata/bin/mirror.bin
format_error Data lost during self-heal
fi