Compare commits

..

15 Commits

Author SHA1 Message Date
d258a6e76b Add --restart_interval to tests 2023-05-10 01:51:02 +03:00
77155ab7bd Add self-restart support to monitor (mainly for tests) 2023-05-10 01:51:02 +03:00
a409598b16 Wait for free space again, but count on big_write flushes instead of just flusher activity 2023-05-10 01:51:02 +03:00
f4c6765522 Ignore ENOENT in epoll_ctl 2023-05-08 20:39:20 +03:00
ad2916068a Fix test_add_osd rebalance timeout check 2023-05-08 20:39:20 +03:00
321cb435a6 Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number 2023-05-08 20:39:20 +03:00
cfcf4f4355 Support checking /dev/nbdX nodes in Docker 2023-05-08 20:39:20 +03:00
e0fb17bfee Make etcd more stable in tests (add ionice and raise timeout) 2023-05-08 20:36:00 +03:00
5b9031fecc Fix monitor possibly applying incorrect PG history under heavy load
Monitor could deceive itself by immediately saving PG configuration changes
which weren't applied to etcd yet in memory, and apply incorrect PG history
changes next time if the first update fails.

This usually only happened under heavy load and was caught in CI. :-)
2023-05-07 23:23:00 +03:00
5da1d8e1b5 Fix EC just-bitmap reads (len=0) (fixes SCHEME=ec test_snapshot.sh) 2023-05-07 14:00:08 +03:00
44f86f1999 Add a basic EC 2+2 recovery test (not really required, but let it be there) 2023-05-07 11:26:27 +03:00
2d9a80c6f6 Implement missing bitmap recovery with ISA-L \(°□°)/ 2023-05-07 11:25:51 +03:00
5e295e346e Do not make vitastor-mon part of vitastor.target 2023-04-29 00:17:47 +03:00
d9c0898b7c Notes about config and vitastor-disk cache status 2023-04-29 00:08:24 +03:00
04cfb48361 Add a note about PVE 7.4 2023-04-28 11:37:11 +03:00
28 changed files with 323 additions and 106 deletions

View File

@@ -17,14 +17,16 @@ Configuration parameters can be set in 3 places:
- Configuration file (`/etc/vitastor/vitastor.conf` or other path) - Configuration file (`/etc/vitastor/vitastor.conf` or other path)
- etcd key `/vitastor/config/global`. Most variables can be set there, but etcd - etcd key `/vitastor/config/global`. Most variables can be set there, but etcd
connection parameters should obviously be set in the configuration file. connection parameters should obviously be set in the configuration file.
- Command line of Vitastor components: OSD, mon, fio and QEMU options, - Command line of Vitastor components: OSD (when you run it without vitastor-disk),
OpenStack/Proxmox/etc configuration. The latter doesn't allow to set all mon, fio and QEMU options, OpenStack/Proxmox/etc configuration. The latter
variables directly, but it allows to override the configuration file and doesn't allow to set all variables directly, but it allows to override the
set everything you need inside it. configuration file and set everything you need inside it.
- OSD superblocks created by [vitastor-disk](../usage/disk.en.md) contain
primarily disk layout parameters of specific OSDs. In fact, these parameters
are automatically passed into the command line of vitastor-osd process, so
they have the same "status" as command-line parameters.
In the future, additional configuration methods may be added: In the future, additional configuration methods may be added:
- OSD superblock which will, by design, contain parameters related to the disk
layout and to one specific OSD.
- OSD-specific keys in etcd like `/vitastor/config/osd/<number>`. - OSD-specific keys in etcd like `/vitastor/config/osd/<number>`.
## Parameter Reference ## Parameter Reference

View File

@@ -19,14 +19,17 @@
- Ключе в etcd `/vitastor/config/global`. Большая часть параметров может - Ключе в etcd `/vitastor/config/global`. Большая часть параметров может
задаваться там, кроме, естественно, самих параметров соединения с etcd, задаваться там, кроме, естественно, самих параметров соединения с etcd,
которые должны задаваться в файле конфигурации которые должны задаваться в файле конфигурации
- В командной строке компонентов Vitastor: OSD, монитора, опциях fio и QEMU, - В командной строке компонентов Vitastor: OSD (при ручном запуске без vitastor-disk),
настроек OpenStack, Proxmox и т.п. Последние, как правило, не включают полный монитора, опциях fio и QEMU, настроек OpenStack, Proxmox и т.п. Последние,
набор параметров напрямую, но разрешают определить путь к файлу конфигурации как правило, не включают полный набор параметров напрямую, но позволяют
и задать любые параметры в нём. определить путь к файлу конфигурации и задать любые параметры в нём.
- В суперблоке OSD, записываемом [vitastor-disk](../usage/disk.ru.md) - параметры,
связанные с дисковым форматом и с этим конкретным OSD. На самом деле,
при запуске OSD эти параметры автоматически передаются в командную строку
процесса vitastor-osd, то есть по "статусу" они эквивалентны параметрам
командной строки OSD.
В будущем также могут быть добавлены другие способы конфигурации: В будущем также могут быть добавлены другие способы конфигурации:
- Суперблок OSD, в котором будут храниться параметры OSD, связанные с дисковым
форматом и с этим конкретным OSD.
- OSD-специфичные ключи в etcd типа `/vitastor/config/osd/<номер>`. - OSD-специфичные ключи в etcd типа `/vitastor/config/osd/<номер>`.
## Список параметров ## Список параметров

View File

@@ -6,10 +6,10 @@
# Proxmox VE # Proxmox VE
To enable Vitastor support in Proxmox Virtual Environment (6.4-7.3 are supported): To enable Vitastor support in Proxmox Virtual Environment (6.4-7.4 are supported):
- Add the corresponding Vitastor Debian repository into sources.list on Proxmox hosts: - Add the corresponding Vitastor Debian repository into sources.list on Proxmox hosts:
buster for 6.4, bullseye for 7.3, pve7.1 for 7.1, pve7.2 for 7.2 buster for 6.4, bullseye for 7.4, pve7.1 for 7.1, pve7.2 for 7.2, pve7.3 for 7.3
- Install vitastor-client, pve-qemu-kvm, pve-storage-vitastor (* or see note) packages from Vitastor repository - Install vitastor-client, pve-qemu-kvm, pve-storage-vitastor (* or see note) packages from Vitastor repository
- Define storage in `/etc/pve/storage.cfg` (see below) - Define storage in `/etc/pve/storage.cfg` (see below)
- Block network access from VMs to Vitastor network (to OSDs and etcd), - Block network access from VMs to Vitastor network (to OSDs and etcd),

View File

@@ -6,10 +6,10 @@
# Proxmox # Proxmox
Чтобы подключить Vitastor к Proxmox Virtual Environment (поддерживаются версии 6.4-7.3): Чтобы подключить Vitastor к Proxmox Virtual Environment (поддерживаются версии 6.4-7.4):
- Добавьте соответствующий Debian-репозиторий Vitastor в sources.list на хостах Proxmox: - Добавьте соответствующий Debian-репозиторий Vitastor в sources.list на хостах Proxmox:
buster для 6.4, bullseye для 7.3, pve7.1 для 7.1, pve7.2 для 7.2 buster для 6.4, bullseye для 7.4, pve7.1 для 7.1, pve7.2 для 7.2, pve7.3 для 7.3
- Установите пакеты vitastor-client, pve-qemu-kvm, pve-storage-vitastor (* или см. сноску) из репозитория Vitastor - Установите пакеты vitastor-client, pve-qemu-kvm, pve-storage-vitastor (* или см. сноску) из репозитория Vitastor
- Определите тип хранилища в `/etc/pve/storage.cfg` (см. ниже) - Определите тип хранилища в `/etc/pve/storage.cfg` (см. ниже)
- Обязательно заблокируйте доступ от виртуальных машин к сети Vitastor (OSD и etcd), т.к. Vitastor (пока) не поддерживает аутентификацию - Обязательно заблокируйте доступ от виртуальных машин к сети Vitastor (OSD и etcd), т.к. Vitastor (пока) не поддерживает аутентификацию

View File

@@ -45,7 +45,9 @@ On the monitor hosts:
} }
``` ```
- Initialize OSDs: - Initialize OSDs:
- SSD-only: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]` - SSD-only: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]`. You can add
`--disable_data_fsync off` to leave disk cache enabled if you use desktop
SSDs without capacitors.
- Hybrid, SSD+HDD: `vitastor-disk prepare --hybrid /dev/sdXXX [/dev/sdYYY ...]`. - Hybrid, SSD+HDD: `vitastor-disk prepare --hybrid /dev/sdXXX [/dev/sdYYY ...]`.
Pass all your devices (HDD and SSD) to this script &mdash; it will partition disks and initialize journals on its own. Pass all your devices (HDD and SSD) to this script &mdash; it will partition disks and initialize journals on its own.
This script skips HDDs which are already partitioned so if you want to use non-empty disks for This script skips HDDs which are already partitioned so if you want to use non-empty disks for
@@ -53,7 +55,9 @@ On the monitor hosts:
but some free unpartitioned space must be available because the script creates new partitions for journals. but some free unpartitioned space must be available because the script creates new partitions for journals.
- You can change OSD configuration in units or in `vitastor.conf`. - You can change OSD configuration in units or in `vitastor.conf`.
Check [Configuration Reference](../config.en.md) for parameter descriptions. Check [Configuration Reference](../config.en.md) for parameter descriptions.
- If all your drives have capacitors, create global configuration in etcd: \ - If all your drives have capacitors, and even if not, but if you ran `vitastor-disk`
without `--disable_data_fsync off` at the first step, then put the following
setting into etcd: \
`etcdctl --endpoints=... put /vitastor/config/global '{"immediate_commit":"all"}'` `etcdctl --endpoints=... put /vitastor/config/global '{"immediate_commit":"all"}'`
- Start all OSDs: `systemctl start vitastor.target` - Start all OSDs: `systemctl start vitastor.target`
@@ -75,6 +79,10 @@ etcdctl --endpoints=... put /vitastor/config/pools '{"2":{"name":"ecpool",
After you do this, one of the monitors will configure PGs and OSDs will start them. After you do this, one of the monitors will configure PGs and OSDs will start them.
If you use HDDs you should also add `"block_size": 1048576` to pool configuration.
The other option is to add it into /vitastor/config/global, in this case it will
apply to all pools by default.
## Check cluster status ## Check cluster status
`vitastor-cli status` `vitastor-cli status`

View File

@@ -45,7 +45,9 @@
} }
``` ```
- Инициализуйте OSD: - Инициализуйте OSD:
- SSD: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]` - SSD: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]`. Если вы используете
десктопные SSD без конденсаторов, можете оставить кэш включённым, добавив
опцию `--disable_data_fsync off`.
- Гибридные, SSD+HDD: `vitastor-disk prepare --hybrid /dev/sdXXX [/dev/sdYYY ...]`. - Гибридные, SSD+HDD: `vitastor-disk prepare --hybrid /dev/sdXXX [/dev/sdYYY ...]`.
Передайте все ваши SSD и HDD скрипту в командной строке подряд, скрипт автоматически выделит Передайте все ваши SSD и HDD скрипту в командной строке подряд, скрипт автоматически выделит
разделы под журналы на SSD и данные на HDD. Скрипт пропускает HDD, на которых уже есть разделы разделы под журналы на SSD и данные на HDD. Скрипт пропускает HDD, на которых уже есть разделы
@@ -54,8 +56,11 @@
для журналов, на SSD должно быть доступно свободное нераспределённое место. для журналов, на SSD должно быть доступно свободное нераспределённое место.
- Вы можете менять параметры OSD в юнитах systemd или в `vitastor.conf`. Описания параметров - Вы можете менять параметры OSD в юнитах systemd или в `vitastor.conf`. Описания параметров
смотрите в [справке по конфигурации](../config.ru.md). смотрите в [справке по конфигурации](../config.ru.md).
- Если все ваши диски - серверные с конденсаторами, пропишите это в глобальную конфигурацию в etcd: \ - Если все ваши диски - серверные с конденсаторами, и даже если нет, но при этом
`etcdctl --endpoints=... put /vitastor/config/global '{"immediate_commit":"all"}'` вы не добавляли опцию `--disable_data_fsync off` на первом шаге, а `vitastor-disk`
не ругался на невозможность отключения кэша дисков, пропишите следующую настройку
в глобальную конфигурацию в etcd: \
`etcdctl --endpoints=... put /vitastor/config/global '{"immediate_commit":"all"}'`.
- Запустите все OSD: `systemctl start vitastor.target` - Запустите все OSD: `systemctl start vitastor.target`
## Создайте пул ## Создайте пул
@@ -76,6 +81,10 @@ etcdctl --endpoints=... put /vitastor/config/pools '{"2":{"name":"ecpool",
После этого один из мониторов должен сконфигурировать PG, а OSD должны запустить их. После этого один из мониторов должен сконфигурировать PG, а OSD должны запустить их.
Если вы используете HDD-диски, то добавьте в конфигурацию пулов опцию `"block_size": 1048576`.
Также эту опцию можно добавить в /vitastor/config/global, в этом случае она будет
применяться ко всем пулам по умолчанию.
## Проверьте состояние кластера ## Проверьте состояние кластера
`vitastor-cli status` `vitastor-cli status`

View File

@@ -43,16 +43,16 @@ function finish_pg_history(merged_history)
merged_history.all_peers = Object.values(merged_history.all_peers); merged_history.all_peers = Object.values(merged_history.all_peers);
} }
function scale_pg_count(prev_pgs, prev_pg_history, new_pg_history, new_pg_count) function scale_pg_count(prev_pgs, real_prev_pgs, prev_pg_history, new_pg_history, new_pg_count)
{ {
const old_pg_count = prev_pgs.length; const old_pg_count = real_prev_pgs.length;
// Add all possibly intersecting PGs to the history of new PGs // Add all possibly intersecting PGs to the history of new PGs
if (!(new_pg_count % old_pg_count)) if (!(new_pg_count % old_pg_count))
{ {
// New PG count is a multiple of old PG count // New PG count is a multiple of old PG count
for (let i = 0; i < new_pg_count; i++) for (let i = 0; i < new_pg_count; i++)
{ {
add_pg_history(new_pg_history, i, prev_pgs, prev_pg_history, i % old_pg_count); add_pg_history(new_pg_history, i, real_prev_pgs, prev_pg_history, i % old_pg_count);
finish_pg_history(new_pg_history[i]); finish_pg_history(new_pg_history[i]);
} }
} }
@@ -64,7 +64,7 @@ function scale_pg_count(prev_pgs, prev_pg_history, new_pg_history, new_pg_count)
{ {
for (let j = 0; j < mul; j++) for (let j = 0; j < mul; j++)
{ {
add_pg_history(new_pg_history, i, prev_pgs, prev_pg_history, i+j*new_pg_count); add_pg_history(new_pg_history, i, real_prev_pgs, prev_pg_history, i+j*new_pg_count);
} }
finish_pg_history(new_pg_history[i]); finish_pg_history(new_pg_history[i]);
} }
@@ -76,7 +76,7 @@ function scale_pg_count(prev_pgs, prev_pg_history, new_pg_history, new_pg_count)
let merged_history = {}; let merged_history = {};
for (let i = 0; i < old_pg_count; i++) for (let i = 0; i < old_pg_count; i++)
{ {
add_pg_history(merged_history, 1, prev_pgs, prev_pg_history, i); add_pg_history(merged_history, 1, real_prev_pgs, prev_pg_history, i);
} }
finish_pg_history(merged_history[1]); finish_pg_history(merged_history[1]);
for (let i = 0; i < new_pg_count; i++) for (let i = 0; i < new_pg_count; i++)
@@ -90,15 +90,15 @@ function scale_pg_count(prev_pgs, prev_pg_history, new_pg_history, new_pg_count)
new_pg_history[i] = null; new_pg_history[i] = null;
} }
// Just for the lp_solve optimizer - pick a "previous" PG for each "new" one // Just for the lp_solve optimizer - pick a "previous" PG for each "new" one
if (old_pg_count < new_pg_count) if (prev_pgs.length < new_pg_count)
{ {
for (let i = old_pg_count; i < new_pg_count; i++) for (let i = prev_pgs.length; i < new_pg_count; i++)
{ {
prev_pgs[i] = prev_pgs[i % old_pg_count]; prev_pgs[i] = prev_pgs[i % prev_pgs.length];
} }
} }
else if (old_pg_count > new_pg_count) else if (prev_pgs.length > new_pg_count)
{ {
prev_pgs.splice(new_pg_count, old_pg_count-new_pg_count); prev_pgs.splice(new_pg_count, prev_pgs.length-new_pg_count);
} }
} }

View File

@@ -13,7 +13,7 @@ for (let i = 2; i < process.argv.length; i++)
{ {
console.error('USAGE: '+process.argv[0]+' '+process.argv[1]+' [--verbose 1]'+ console.error('USAGE: '+process.argv[0]+' '+process.argv[1]+' [--verbose 1]'+
' [--etcd_address "http://127.0.0.1:2379,..."] [--config_path /etc/vitastor/vitastor.conf]'+ ' [--etcd_address "http://127.0.0.1:2379,..."] [--config_path /etc/vitastor/vitastor.conf]'+
' [--etcd_prefix "/vitastor"] [--etcd_start_timeout 5]'); ' [--etcd_prefix "/vitastor"] [--etcd_start_timeout 5] [--restart_interval 5]');
process.exit(); process.exit();
} }
else if (process.argv[i].substr(0, 2) == '--') else if (process.argv[i].substr(0, 2) == '--')

View File

@@ -561,7 +561,7 @@ class Mon
} }
if (!this.ws) if (!this.ws)
{ {
this.die('Failed to open etcd watch websocket'); await this.die('Failed to open etcd watch websocket');
} }
const cur_addr = this.selected_etcd_url; const cur_addr = this.selected_etcd_url;
this.ws_alive = true; this.ws_alive = true;
@@ -728,7 +728,7 @@ class Mon
const res = await this.etcd_call('/lease/keepalive', { ID: this.etcd_lease_id }, this.config.etcd_mon_timeout, this.config.etcd_mon_retries); const res = await this.etcd_call('/lease/keepalive', { ID: this.etcd_lease_id }, this.config.etcd_mon_timeout, this.config.etcd_mon_retries);
if (!res.result.TTL) if (!res.result.TTL)
{ {
this.die('Lease expired'); await this.die('Lease expired');
} }
}, this.config.etcd_mon_timeout); }, this.config.etcd_mon_timeout);
if (!this.signals_set) if (!this.signals_set)
@@ -740,11 +740,34 @@ class Mon
} }
async on_stop(status) async on_stop(status)
{
if (this.ws_keepalive_timer)
{
clearInterval(this.ws_keepalive_timer);
this.ws_keepalive_timer = null;
}
if (this.lease_timer)
{ {
clearInterval(this.lease_timer); clearInterval(this.lease_timer);
await this.etcd_call('/lease/revoke', { ID: this.etcd_lease_id }, this.config.etcd_mon_timeout, this.config.etcd_mon_retries); this.lease_timer = null;
}
if (this.etcd_lease_id)
{
const lease_id = this.etcd_lease_id;
this.etcd_lease_id = null;
await this.etcd_call('/lease/revoke', { ID: lease_id }, this.config.etcd_mon_timeout, this.config.etcd_mon_retries);
}
if (!status || !this.initConfig.restart_interval)
{
process.exit(status); process.exit(status);
} }
else
{
console.log('Restarting after '+this.initConfig.restart_interval+' seconds');
await new Promise(ok => setTimeout(ok, this.initConfig.restart_interval*1000));
await this.start();
}
}
async become_master() async become_master()
{ {
@@ -956,7 +979,7 @@ class Mon
return alive_set[this.rng() % alive_set.length]; return alive_set[this.rng() % alive_set.length];
} }
save_new_pgs_txn(request, pool_id, up_osds, osd_tree, prev_pgs, new_pgs, pg_history) save_new_pgs_txn(save_to, request, pool_id, up_osds, osd_tree, prev_pgs, new_pgs, pg_history)
{ {
const aff_osds = this.get_affinity_osds(this.state.config.pools[pool_id], up_osds, osd_tree); const aff_osds = this.get_affinity_osds(this.state.config.pools[pool_id], up_osds, osd_tree);
const pg_items = {}; const pg_items = {};
@@ -1009,14 +1032,14 @@ class Mon
}); });
} }
} }
this.state.config.pgs.items = this.state.config.pgs.items || {}; save_to.items = save_to.items || {};
if (!new_pgs.length) if (!new_pgs.length)
{ {
delete this.state.config.pgs.items[pool_id]; delete save_to.items[pool_id];
} }
else else
{ {
this.state.config.pgs.items[pool_id] = pg_items; save_to.items[pool_id] = pg_items;
} }
} }
@@ -1160,6 +1183,7 @@ class Mon
if (this.state.config.pgs.hash != tree_hash) if (this.state.config.pgs.hash != tree_hash)
{ {
// Something has changed // Something has changed
const new_config_pgs = JSON.parse(JSON.stringify(this.state.config.pgs));
const etcd_request = { compare: [], success: [] }; const etcd_request = { compare: [], success: [] };
for (const pool_id in (this.state.config.pgs||{}).items||{}) for (const pool_id in (this.state.config.pgs||{}).items||{})
{ {
@@ -1180,7 +1204,7 @@ class Mon
etcd_request.success.push({ requestDeleteRange: { etcd_request.success.push({ requestDeleteRange: {
key: b64(this.etcd_prefix+'/pool/stats/'+pool_id), key: b64(this.etcd_prefix+'/pool/stats/'+pool_id),
} }); } });
this.save_new_pgs_txn(etcd_request, pool_id, up_osds, osd_tree, prev_pgs, [], []); this.save_new_pgs_txn(new_config_pgs, etcd_request, pool_id, up_osds, osd_tree, prev_pgs, [], []);
} }
} }
for (const pool_id in this.state.config.pools) for (const pool_id in this.state.config.pools)
@@ -1234,7 +1258,7 @@ class Mon
return; return;
} }
const new_pg_history = []; const new_pg_history = [];
PGUtil.scale_pg_count(prev_pgs, pg_history, new_pg_history, pool_cfg.pg_count); PGUtil.scale_pg_count(prev_pgs, real_prev_pgs, pg_history, new_pg_history, pool_cfg.pg_count);
pg_history = new_pg_history; pg_history = new_pg_history;
} }
for (const pg of prev_pgs) for (const pg of prev_pgs)
@@ -1287,14 +1311,15 @@ class Mon
key: b64(this.etcd_prefix+'/pool/stats/'+pool_id), key: b64(this.etcd_prefix+'/pool/stats/'+pool_id),
value: b64(JSON.stringify(this.state.pool.stats[pool_id])), value: b64(JSON.stringify(this.state.pool.stats[pool_id])),
} }); } });
this.save_new_pgs_txn(etcd_request, pool_id, up_osds, osd_tree, real_prev_pgs, optimize_result.int_pgs, pg_history); this.save_new_pgs_txn(new_config_pgs, etcd_request, pool_id, up_osds, osd_tree, real_prev_pgs, optimize_result.int_pgs, pg_history);
} }
this.state.config.pgs.hash = tree_hash; new_config_pgs.hash = tree_hash;
await this.save_pg_config(etcd_request); await this.save_pg_config(new_config_pgs, etcd_request);
} }
else else
{ {
// Nothing changed, but we still want to recheck the distribution of primaries // Nothing changed, but we still want to recheck the distribution of primaries
let new_config_pgs;
let changed = false; let changed = false;
for (const pool_id in this.state.config.pools) for (const pool_id in this.state.config.pools)
{ {
@@ -1314,31 +1339,35 @@ class Mon
const new_primary = this.pick_primary(pool_id, pg_cfg.osd_set, up_osds, aff_osds); const new_primary = this.pick_primary(pool_id, pg_cfg.osd_set, up_osds, aff_osds);
if (pg_cfg.primary != new_primary) if (pg_cfg.primary != new_primary)
{ {
if (!new_config_pgs)
{
new_config_pgs = JSON.parse(JSON.stringify(this.state.config.pgs));
}
console.log( console.log(
`Moving pool ${pool_id} (${pool_cfg.name || 'unnamed'}) PG ${pg_num}`+ `Moving pool ${pool_id} (${pool_cfg.name || 'unnamed'}) PG ${pg_num}`+
` primary OSD from ${pg_cfg.primary} to ${new_primary}` ` primary OSD from ${pg_cfg.primary} to ${new_primary}`
); );
changed = true; changed = true;
pg_cfg.primary = new_primary; new_config_pgs.items[pool_id][pg_num].primary = new_primary;
} }
} }
} }
} }
if (changed) if (changed)
{ {
await this.save_pg_config(); await this.save_pg_config(new_config_pgs);
} }
} }
} }
async save_pg_config(etcd_request = { compare: [], success: [] }) async save_pg_config(new_config_pgs, etcd_request = { compare: [], success: [] })
{ {
etcd_request.compare.push( etcd_request.compare.push(
{ key: b64(this.etcd_prefix+'/mon/master'), target: 'LEASE', lease: ''+this.etcd_lease_id }, { key: b64(this.etcd_prefix+'/mon/master'), target: 'LEASE', lease: ''+this.etcd_lease_id },
{ key: b64(this.etcd_prefix+'/config/pgs'), target: 'MOD', mod_revision: ''+this.etcd_watch_revision, result: 'LESS' }, { key: b64(this.etcd_prefix+'/config/pgs'), target: 'MOD', mod_revision: ''+this.etcd_watch_revision, result: 'LESS' },
); );
etcd_request.success.push( etcd_request.success.push(
{ requestPut: { key: b64(this.etcd_prefix+'/config/pgs'), value: b64(JSON.stringify(this.state.config.pgs)) } }, { requestPut: { key: b64(this.etcd_prefix+'/config/pgs'), value: b64(JSON.stringify(new_config_pgs)) } },
); );
const res = await this.etcd_call('/kv/txn', etcd_request, this.config.etcd_mon_timeout, 0); const res = await this.etcd_call('/kv/txn', etcd_request, this.config.etcd_mon_timeout, 0);
if (!res.succeeded) if (!res.succeeded)
@@ -1765,14 +1794,13 @@ class Mon
return res.json; return res.json;
} }
} }
this.die(); await this.die();
} }
_die(err) async _die(err)
{ {
// In fact we can just try to rejoin
console.error(new Error(err || 'Cluster connection failed')); console.error(new Error(err || 'Cluster connection failed'));
process.exit(1); await this.on_stop(1);
} }
local_ips(all) local_ips(all)
@@ -1817,6 +1845,7 @@ function POST(url, body, timeout)
clearTimeout(timer_id); clearTimeout(timer_id);
let res_body = ''; let res_body = '';
res.setEncoding('utf8'); res.setEncoding('utf8');
res.on('error', no);
res.on('data', chunk => { res_body += chunk; }); res.on('data', chunk => { res_body += chunk; });
res.on('end', () => res.on('end', () =>
{ {
@@ -1836,6 +1865,8 @@ function POST(url, body, timeout)
} }
}); });
}); });
req.on('error', no);
req.on('close', () => no(new Error('Connection closed prematurely')));
req.write(body_text); req.write(body_text);
req.end(); req.end();
}); });

View File

@@ -15,4 +15,4 @@ StartLimitInterval=0
RestartSec=10 RestartSec=10
[Install] [Install]
WantedBy=vitastor.target WantedBy=multi-user.target

View File

@@ -307,6 +307,18 @@ void blockstore_impl_t::check_wait(blockstore_op_t *op)
} }
PRIV(op)->wait_for = 0; PRIV(op)->wait_for = 0;
} }
else if (PRIV(op)->wait_for == WAIT_FREE)
{
if (!data_alloc->get_free_count() && big_to_flush > 0)
{
#ifdef BLOCKSTORE_DEBUG
printf("Still waiting for free space on the data device\n");
#endif
return;
}
flusher->release_trim();
PRIV(op)->wait_for = 0;
}
else else
{ {
throw std::runtime_error("BUG: op->wait_for value is unexpected"); throw std::runtime_error("BUG: op->wait_for value is unexpected");

View File

@@ -160,6 +160,8 @@ struct __attribute__((__packed__)) dirty_entry
#define WAIT_JOURNAL 3 #define WAIT_JOURNAL 3
// Suspend operation until the next journal sector buffer is free // Suspend operation until the next journal sector buffer is free
#define WAIT_JOURNAL_BUFFER 4 #define WAIT_JOURNAL_BUFFER 4
// Suspend operation until there is some free space on the data device
#define WAIT_FREE 5
struct fulfill_read_t struct fulfill_read_t
{ {
@@ -263,6 +265,7 @@ class blockstore_impl_t
struct journal_t journal; struct journal_t journal;
journal_flusher_t *flusher; journal_flusher_t *flusher;
int big_to_flush = 0;
int write_iodepth = 0; int write_iodepth = 0;
bool live = false, queue_stall = false; bool live = false, queue_stall = false;

View File

@@ -201,6 +201,11 @@ void blockstore_impl_t::erase_dirty(blockstore_dirty_db_t::iterator dirty_start,
} }
while (1) while (1)
{ {
if ((IS_BIG_WRITE(dirty_it->second.state) || IS_DELETE(dirty_it->second.state)) &&
IS_STABLE(dirty_it->second.state))
{
big_to_flush--;
}
if (IS_BIG_WRITE(dirty_it->second.state) && dirty_it->second.location != clean_loc && if (IS_BIG_WRITE(dirty_it->second.state) && dirty_it->second.location != clean_loc &&
dirty_it->second.location != UINT64_MAX) dirty_it->second.location != UINT64_MAX)
{ {

View File

@@ -446,6 +446,7 @@ void blockstore_impl_t::mark_stable(const obj_ver_id & v, bool forget_dirty)
{ {
inode_space_stats[dirty_it->first.oid.inode] += dsk.data_block_size; inode_space_stats[dirty_it->first.oid.inode] += dsk.data_block_size;
} }
big_to_flush++;
} }
else if (IS_DELETE(dirty_it->second.state)) else if (IS_DELETE(dirty_it->second.state))
{ {
@@ -454,6 +455,7 @@ void blockstore_impl_t::mark_stable(const obj_ver_id & v, bool forget_dirty)
sp -= dsk.data_block_size; sp -= dsk.data_block_size;
else else
inode_space_stats.erase(dirty_it->first.oid.inode); inode_space_stats.erase(dirty_it->first.oid.inode);
big_to_flush++;
} }
} }
if (forget_dirty && (IS_BIG_WRITE(dirty_it->second.state) || if (forget_dirty && (IS_BIG_WRITE(dirty_it->second.state) ||

View File

@@ -271,6 +271,13 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
if (loc == UINT64_MAX) if (loc == UINT64_MAX)
{ {
// no space // no space
if (big_to_flush > 0)
{
// hope that some space will be available after flush
flusher->request_trim();
PRIV(op)->wait_for = WAIT_FREE;
return 0;
}
cancel_all_writes(op, dirty_it, -ENOSPC); cancel_all_writes(op, dirty_it, -ENOSPC);
return 2; return 2;
} }

View File

@@ -54,6 +54,13 @@ void epoll_manager_t::set_fd_handler(int fd, bool wr, std::function<void(int, in
ev.events = (wr ? EPOLLOUT : 0) | EPOLLIN | EPOLLRDHUP | EPOLLET; ev.events = (wr ? EPOLLOUT : 0) | EPOLLIN | EPOLLRDHUP | EPOLLET;
if (epoll_ctl(epoll_fd, exists ? EPOLL_CTL_MOD : EPOLL_CTL_ADD, fd, &ev) < 0) if (epoll_ctl(epoll_fd, exists ? EPOLL_CTL_MOD : EPOLL_CTL_ADD, fd, &ev) < 0)
{ {
if (errno == ENOENT)
{
// The FD is probably already closed
epoll_ctl(epoll_fd, EPOLL_CTL_DEL, fd, NULL);
epoll_handlers.erase(fd);
return;
}
throw std::runtime_error(std::string("epoll_ctl: ") + strerror(errno)); throw std::runtime_error(std::string("epoll_ctl: ") + strerror(errno));
} }
epoll_handlers[fd] = handler; epoll_handlers[fd] = handler;

View File

@@ -191,7 +191,7 @@ struct __attribute__((__packed__)) osd_op_rw_t
uint64_t inode; uint64_t inode;
// offset // offset
uint64_t offset; uint64_t offset;
// length // length. 0 means to read all bitmaps of the specified range, but no data.
uint32_t len; uint32_t len;
// flags (for future) // flags (for future)
uint32_t flags; uint32_t flags;

View File

@@ -186,11 +186,23 @@ void osd_t::continue_primary_read(osd_op_t *cur_op)
cur_op->reply.rw.bitmap_len = 0; cur_op->reply.rw.bitmap_len = 0;
{ {
auto & pg = pgs.at({ .pool_id = INODE_POOL(op_data->oid.inode), .pg_num = op_data->pg_num }); auto & pg = pgs.at({ .pool_id = INODE_POOL(op_data->oid.inode), .pg_num = op_data->pg_num });
if (cur_op->req.rw.len == 0)
{
// len=0 => bitmap read
for (int role = 0; role < op_data->pg_data_size; role++)
{
op_data->stripes[role].read_start = 0;
op_data->stripes[role].read_end = UINT32_MAX;
}
}
else
{
for (int role = 0; role < op_data->pg_data_size; role++) for (int role = 0; role < op_data->pg_data_size; role++)
{ {
op_data->stripes[role].read_start = op_data->stripes[role].req_start; op_data->stripes[role].read_start = op_data->stripes[role].req_start;
op_data->stripes[role].read_end = op_data->stripes[role].req_end; op_data->stripes[role].read_end = op_data->stripes[role].req_end;
} }
}
// Determine version // Determine version
auto vo_it = pg.ver_override.find(op_data->oid); auto vo_it = pg.ver_override.find(op_data->oid);
op_data->target_ver = vo_it != pg.ver_override.end() ? vo_it->second : UINT64_MAX; op_data->target_ver = vo_it != pg.ver_override.end() ? vo_it->second : UINT64_MAX;

View File

@@ -151,6 +151,13 @@ int osd_t::submit_primary_subop_batch(int submit_type, inode_t inode, uint64_t o
{ {
int stripe_num = rep ? 0 : role; int stripe_num = rep ? 0 : role;
osd_op_t *subop = op_data->subops + i; osd_op_t *subop = op_data->subops + i;
uint32_t subop_len = wr
? stripes[stripe_num].write_end - stripes[stripe_num].write_start
: stripes[stripe_num].read_end - stripes[stripe_num].read_start;
if (!wr && stripes[stripe_num].read_end == UINT32_MAX)
{
subop_len = 0;
}
if (role_osd_num == this->osd_num) if (role_osd_num == this->osd_num)
{ {
clock_gettime(CLOCK_REALTIME, &subop->tv_begin); clock_gettime(CLOCK_REALTIME, &subop->tv_begin);
@@ -169,7 +176,7 @@ int osd_t::submit_primary_subop_batch(int submit_type, inode_t inode, uint64_t o
}, },
.version = op_version, .version = op_version,
.offset = wr ? stripes[stripe_num].write_start : stripes[stripe_num].read_start, .offset = wr ? stripes[stripe_num].write_start : stripes[stripe_num].read_start,
.len = wr ? stripes[stripe_num].write_end - stripes[stripe_num].write_start : stripes[stripe_num].read_end - stripes[stripe_num].read_start, .len = subop_len,
.buf = wr ? stripes[stripe_num].write_buf : stripes[stripe_num].read_buf, .buf = wr ? stripes[stripe_num].write_buf : stripes[stripe_num].read_buf,
.bitmap = stripes[stripe_num].bmp_buf, .bitmap = stripes[stripe_num].bmp_buf,
}); });
@@ -199,7 +206,7 @@ int osd_t::submit_primary_subop_batch(int submit_type, inode_t inode, uint64_t o
}, },
.version = op_version, .version = op_version,
.offset = wr ? stripes[stripe_num].write_start : stripes[stripe_num].read_start, .offset = wr ? stripes[stripe_num].write_start : stripes[stripe_num].read_start,
.len = wr ? stripes[stripe_num].write_end - stripes[stripe_num].write_start : stripes[stripe_num].read_end - stripes[stripe_num].read_start, .len = subop_len,
.attr_len = wr ? clean_entry_bitmap_size : 0, .attr_len = wr ? clean_entry_bitmap_size : 0,
}; };
#ifdef OSD_DEBUG #ifdef OSD_DEBUG
@@ -218,9 +225,9 @@ int osd_t::submit_primary_subop_batch(int submit_type, inode_t inode, uint64_t o
} }
else else
{ {
if (stripes[stripe_num].read_end > stripes[stripe_num].read_start) if (subop_len > 0)
{ {
subop->iov.push_back(stripes[stripe_num].read_buf, stripes[stripe_num].read_end - stripes[stripe_num].read_start); subop->iov.push_back(stripes[stripe_num].read_buf, subop_len);
} }
} }
subop->callback = [cur_op, this](osd_op_t *subop) subop->callback = [cur_op, this](osd_op_t *subop)

View File

@@ -28,7 +28,9 @@ static inline void extend_read(uint32_t start, uint32_t end, osd_rmw_stripe_t &
} }
else else
{ {
if (stripe.read_end < end) if (stripe.read_end < end && end != UINT32_MAX ||
// UINT32_MAX means that stripe only needs bitmap, end != 0 => needs also data
stripe.read_end == UINT32_MAX && end != 0)
stripe.read_end = end; stripe.read_end = end;
if (stripe.read_start > start) if (stripe.read_start > start)
stripe.read_start = start; stripe.read_start = start;
@@ -104,6 +106,8 @@ void reconstruct_stripes_xor(osd_rmw_stripe_t *stripes, int pg_size, uint32_t bi
prev = other; prev = other;
} }
else if (prev >= 0) else if (prev >= 0)
{
if (stripes[role].read_end != UINT32_MAX)
{ {
assert(stripes[role].read_start >= stripes[prev].read_start && assert(stripes[role].read_start >= stripes[prev].read_start &&
stripes[role].read_start >= stripes[other].read_start); stripes[role].read_start >= stripes[other].read_start);
@@ -112,10 +116,13 @@ void reconstruct_stripes_xor(osd_rmw_stripe_t *stripes, int pg_size, uint32_t bi
(uint8_t*)stripes[other].read_buf + (stripes[role].read_start - stripes[other].read_start), (uint8_t*)stripes[other].read_buf + (stripes[role].read_start - stripes[other].read_start),
stripes[role].read_buf, stripes[role].read_end - stripes[role].read_start stripes[role].read_buf, stripes[role].read_end - stripes[role].read_start
); );
}
memxor(stripes[prev].bmp_buf, stripes[other].bmp_buf, stripes[role].bmp_buf, bitmap_size); memxor(stripes[prev].bmp_buf, stripes[other].bmp_buf, stripes[role].bmp_buf, bitmap_size);
prev = -1; prev = -1;
} }
else else
{
if (stripes[role].read_end != UINT32_MAX)
{ {
assert(stripes[role].read_start >= stripes[other].read_start); assert(stripes[role].read_start >= stripes[other].read_start);
memxor( memxor(
@@ -123,6 +130,7 @@ void reconstruct_stripes_xor(osd_rmw_stripe_t *stripes, int pg_size, uint32_t bi
(uint8_t*)stripes[other].read_buf + (stripes[role].read_start - stripes[other].read_start), (uint8_t*)stripes[other].read_buf + (stripes[role].read_start - stripes[other].read_start),
stripes[role].read_buf, stripes[role].read_end - stripes[role].read_start stripes[role].read_buf, stripes[role].read_end - stripes[role].read_start
); );
}
memxor(stripes[role].bmp_buf, stripes[other].bmp_buf, stripes[role].bmp_buf, bitmap_size); memxor(stripes[role].bmp_buf, stripes[other].bmp_buf, stripes[role].bmp_buf, bitmap_size);
} }
} }
@@ -355,9 +363,11 @@ void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsi
int wanted_base = 0, wanted = 0; int wanted_base = 0, wanted = 0;
uint64_t read_start = 0, read_end = 0; uint64_t read_start = 0, read_end = 0;
auto recover_seq = [&]() auto recover_seq = [&]()
{
if (read_end != UINT32_MAX)
{ {
int orig = 0; int orig = 0;
for (int other = 0; other < pg_size; other++) for (int other = 0; other < pg_size && orig < pg_minsize; other++)
{ {
if (stripes[other].read_end != 0 && !stripes[other].missing) if (stripes[other].read_end != 0 && !stripes[other].missing)
{ {
@@ -370,6 +380,7 @@ void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsi
read_end-read_start, pg_minsize, wanted, dectable + wanted_base*32*pg_minsize, read_end-read_start, pg_minsize, wanted, dectable + wanted_base*32*pg_minsize,
data_ptrs, data_ptrs + pg_minsize data_ptrs, data_ptrs + pg_minsize
); );
}
wanted_base += wanted; wanted_base += wanted;
wanted = 0; wanted = 0;
}; };
@@ -391,6 +402,32 @@ void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsi
{ {
recover_seq(); recover_seq();
} }
// Recover bitmaps
if (bitmap_size > 0)
{
for (int role = 0; role < pg_minsize; role++)
{
if (stripes[role].read_end != 0 && stripes[role].missing)
{
data_ptrs[pg_minsize + (wanted++)] = (uint8_t*)stripes[role].bmp_buf;
}
}
if (wanted > 0)
{
int orig = 0;
for (int other = 0; other < pg_size && orig < pg_minsize; other++)
{
if (stripes[other].read_end != 0 && !stripes[other].missing)
{
data_ptrs[orig++] = (uint8_t*)stripes[other].bmp_buf;
}
}
ec_encode_data(
bitmap_size, pg_minsize, wanted, dectable,
data_ptrs, data_ptrs + pg_minsize
);
}
}
} }
#else #else
void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsize, uint32_t bitmap_size) void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsize, uint32_t bitmap_size)
@@ -412,7 +449,8 @@ void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsi
if (stripes[role].read_end != 0 && stripes[role].missing) if (stripes[role].read_end != 0 && stripes[role].missing)
{ {
recovered = true; recovered = true;
if (stripes[role].read_end > stripes[role].read_start) if (stripes[role].read_end > stripes[role].read_start &&
stripes[role].read_end != UINT32_MAX)
{ {
for (int other = 0; other < pg_size; other++) for (int other = 0; other < pg_size; other++)
{ {
@@ -531,7 +569,8 @@ void* alloc_read_buffer(osd_rmw_stripe_t *stripes, int read_pg_size, uint64_t ad
uint64_t buf_size = add_size; uint64_t buf_size = add_size;
for (int role = 0; role < read_pg_size; role++) for (int role = 0; role < read_pg_size; role++)
{ {
if (stripes[role].read_end != 0) if (stripes[role].read_end != 0 &&
stripes[role].read_end != UINT32_MAX)
{ {
buf_size += stripes[role].read_end - stripes[role].read_start; buf_size += stripes[role].read_end - stripes[role].read_start;
} }
@@ -541,7 +580,8 @@ void* alloc_read_buffer(osd_rmw_stripe_t *stripes, int read_pg_size, uint64_t ad
uint64_t buf_pos = add_size; uint64_t buf_pos = add_size;
for (int role = 0; role < read_pg_size; role++) for (int role = 0; role < read_pg_size; role++)
{ {
if (stripes[role].read_end != 0) if (stripes[role].read_end != 0 &&
stripes[role].read_end != UINT32_MAX)
{ {
stripes[role].read_buf = (uint8_t*)buf + buf_pos; stripes[role].read_buf = (uint8_t*)buf + buf_pos;
buf_pos += stripes[role].read_end - stripes[role].read_start; buf_pos += stripes[role].read_end - stripes[role].read_start;

View File

@@ -23,6 +23,7 @@ struct osd_rmw_stripe_t
void *read_buf, *write_buf; void *read_buf, *write_buf;
void *bmp_buf; void *bmp_buf;
uint32_t req_start, req_end; uint32_t req_start, req_end;
// read_end=UINT32_MAX means to only read bitmap, but not data
uint32_t read_start, read_end; uint32_t read_start, read_end;
uint32_t write_start, write_end; uint32_t write_start, write_end;
bool missing; bool missing;

View File

@@ -27,6 +27,7 @@ void test13();
void test14(); void test14();
void test15(bool second); void test15(bool second);
void test16(); void test16();
void test_recover_22_d2();
int main(int narg, char *args[]) int main(int narg, char *args[])
{ {
@@ -61,6 +62,8 @@ int main(int narg, char *args[])
test15(true); test15(true);
// Test 16 // Test 16
test16(); test16();
// Test 17
test_recover_22_d2();
// End // End
printf("all ok\n"); printf("all ok\n");
return 0; return 0;
@@ -1045,7 +1048,12 @@ void test16()
assert(stripes[3].read_buf == (uint8_t*)read_buf+2*128*1024); assert(stripes[3].read_buf == (uint8_t*)read_buf+2*128*1024);
set_pattern(stripes[1].read_buf, 128*1024, PATTERN2); set_pattern(stripes[1].read_buf, 128*1024, PATTERN2);
memcpy(stripes[3].read_buf, rmw_buf, 128*1024); memcpy(stripes[3].read_buf, rmw_buf, 128*1024);
memset(stripes[0].bmp_buf, 0xa8, bmp);
memset(stripes[2].bmp_buf, 0xb7, bmp);
assert(bitmaps[1] == 0xFFFFFFFF);
assert(bitmaps[3] == 0xF1F1F1F1);
reconstruct_stripes_ec(stripes, 4, 2, bmp); reconstruct_stripes_ec(stripes, 4, 2, bmp);
assert(*(uint32_t*)stripes[3].bmp_buf == 0xF1F1F1F1);
assert(bitmaps[0] == 0xFFFFFFFF); assert(bitmaps[0] == 0xFFFFFFFF);
check_pattern(stripes[0].read_buf, 128*1024, PATTERN1); check_pattern(stripes[0].read_buf, 128*1024, PATTERN1);
free(read_buf); free(read_buf);
@@ -1054,3 +1062,47 @@ void test16()
free(write_buf); free(write_buf);
use_ec(4, 2, false); use_ec(4, 2, false);
} }
/***
17. EC 2+2 recover second data block
***/
void test_recover_22_d2()
{
const int bmp = 128*1024 / 4096 / 8;
use_ec(4, 2, true);
osd_num_t osd_set[4] = { 1, 0, 3, 4 };
osd_rmw_stripe_t stripes[4] = {};
unsigned bitmaps[4] = { 0 };
// Read 0-256K
split_stripes(2, 128*1024, 0, 256*1024, stripes);
assert(stripes[0].req_start == 0 && stripes[0].req_end == 128*1024);
assert(stripes[1].req_start == 0 && stripes[1].req_end == 128*1024);
assert(stripes[2].req_start == 0 && stripes[2].req_end == 0);
assert(stripes[3].req_start == 0 && stripes[3].req_end == 0);
uint8_t *data_buf = (uint8_t*)malloc_or_die(128*1024*4);
for (int i = 0; i < 4; i++)
{
stripes[i].read_start = stripes[i].req_start;
stripes[i].read_end = stripes[i].req_end;
stripes[i].read_buf = data_buf + i*128*1024;
stripes[i].bmp_buf = bitmaps + i;
}
// Read using parity
assert(extend_missing_stripes(stripes, osd_set, 2, 4) == 0);
assert(stripes[2].read_start == 0 && stripes[2].read_end == 128*1024);
assert(stripes[3].read_start == 0 && stripes[3].read_end == 0);
bitmaps[0] = 0xffffffff;
bitmaps[2] = 0;
set_pattern(stripes[0].read_buf, 128*1024, PATTERN1);
set_pattern(stripes[2].read_buf, 128*1024, PATTERN1^PATTERN2);
// Reconstruct
reconstruct_stripes_ec(stripes, 4, 2, bmp);
check_pattern(stripes[1].read_buf, 128*1024, PATTERN2);
assert(bitmaps[1] == 0xFFFFFFFF);
free(data_buf);
// Done
use_ec(4, 2, false);
}

View File

@@ -36,12 +36,12 @@ for i in $(seq 2 $ETCD_COUNT); do
ETCD_URL="$ETCD_URL,http://$ETCD_IP:$((ETCD_PORT+2*i-2))" ETCD_URL="$ETCD_URL,http://$ETCD_IP:$((ETCD_PORT+2*i-2))"
ETCD_CLUSTER="$ETCD_CLUSTER,etcd$i=http://$ETCD_IP:$((ETCD_PORT+2*i-1))" ETCD_CLUSTER="$ETCD_CLUSTER,etcd$i=http://$ETCD_IP:$((ETCD_PORT+2*i-1))"
done done
ETCDCTL="${ETCD}ctl --endpoints=$ETCD_URL" ETCDCTL="${ETCD}ctl --endpoints=$ETCD_URL --dial-timeout=5s --command-timeout=10s"
start_etcd() start_etcd()
{ {
local i=$1 local i=$1
$ETCD -name etcd$i --data-dir ./testdata/etcd$i \ ionice -c2 -n0 $ETCD -name etcd$i --data-dir ./testdata/etcd$i \
--advertise-client-urls http://$ETCD_IP:$((ETCD_PORT+2*i-2)) --listen-client-urls http://$ETCD_IP:$((ETCD_PORT+2*i-2)) \ --advertise-client-urls http://$ETCD_IP:$((ETCD_PORT+2*i-2)) --listen-client-urls http://$ETCD_IP:$((ETCD_PORT+2*i-2)) \
--initial-advertise-peer-urls http://$ETCD_IP:$((ETCD_PORT+2*i-1)) --listen-peer-urls http://$ETCD_IP:$((ETCD_PORT+2*i-1)) \ --initial-advertise-peer-urls http://$ETCD_IP:$((ETCD_PORT+2*i-1)) --listen-peer-urls http://$ETCD_IP:$((ETCD_PORT+2*i-1)) \
--initial-cluster-token vitastor-tests-etcd --initial-cluster-state new \ --initial-cluster-token vitastor-tests-etcd --initial-cluster-state new \
@@ -53,8 +53,11 @@ start_etcd()
for i in $(seq 1 $ETCD_COUNT); do for i in $(seq 1 $ETCD_COUNT); do
start_etcd $i start_etcd $i
done done
if [ $ETCD_COUNT -gt 1 ]; then for i in {1..10}; do
sleep 1 ${ETCD}ctl --endpoints=$ETCD_URL --dial-timeout=1s --command-timeout=1s member list >/dev/null && break
done
if [[ $i = 10 ]]; then
format_error "Failed to start etcd"
fi fi
echo leak:fio >> testdata/lsan-suppress.txt echo leak:fio >> testdata/lsan-suppress.txt

View File

@@ -39,7 +39,7 @@ done
cd mon cd mon
npm install npm install
cd .. cd ..
node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" --verbose 1 &>./testdata/mon.log & node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" --verbose 1 --restart_interval 5 &>./testdata/mon.log &
MON_PID=$! MON_PID=$!
if [ "$SCHEME" = "ec" ]; then if [ "$SCHEME" = "ec" ]; then
@@ -100,13 +100,13 @@ wait_finish_rebalance()
sec=$1 sec=$1
i=0 i=0
while [[ $i -lt $sec ]]; do while [[ $i -lt $sec ]]; do
($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"]) ] | length) == 32') && \ ($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"] or .state == ["active", "left_on_dead"]) ] | length) == '$PG_COUNT) && \
break break
if [ $i -eq 60 ]; then
format_error "Rebalance couldn't finish in $sec seconds"
fi
sleep 1 sleep 1
i=$((i+1)) i=$((i+1))
if [ $i -eq $sec ]; then
format_error "Rebalance couldn't finish in $sec seconds"
fi
done done
} }
@@ -117,3 +117,14 @@ check_qemu()
sudo ln -s "$(realpath .)/build/src/block-vitastor.so" /usr/lib/x86_64-linux-gnu/qemu/block-vitastor.so sudo ln -s "$(realpath .)/build/src/block-vitastor.so" /usr/lib/x86_64-linux-gnu/qemu/block-vitastor.so
fi fi
} }
check_nbd()
{
if [[ -d /sys/module/nbd && ! -e /dev/nbd0 ]]; then
max_part=$(cat /sys/module/nbd/parameters/max_part)
nbds_max=$(cat /sys/module/nbd/parameters/nbds_max)
for i in $(seq 1 $nbds_max); do
mknod /dev/nbd$((i-1)) b 43 $(((i-1)*(max_part+1)))
done
fi
}

View File

@@ -15,10 +15,10 @@ done
sleep 2 sleep 2
for i in {1..10}; do for i in {1..30}; do
($ETCDCTL get /vitastor/config/pgs --print-value-only |\ ($ETCDCTL get /vitastor/config/pgs --print-value-only |\
jq -s -e '([ .[0].items["1"] | map(.osd_set)[][] ] | sort | unique == ["1","2","3","4"])') && \ jq -s -e '([ .[0].items["1"] | map(.osd_set)[][] ] | sort | unique == ["1","2","3","4"])') && \
($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"]) ] | length) == '$PG_COUNT'') && \ ($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"]) ] | length) == '$PG_COUNT) && \
break break
sleep 1 sleep 1
done done
@@ -28,7 +28,7 @@ if ! ($ETCDCTL get /vitastor/config/pgs --print-value-only |\
format_error "FAILED: OSD NOT ADDED INTO DISTRIBUTION" format_error "FAILED: OSD NOT ADDED INTO DISTRIBUTION"
fi fi
wait_finish_rebalance 10 wait_finish_rebalance 20
sleep 1 sleep 1
kill -9 $OSD4_PID kill -9 $OSD4_PID
@@ -37,7 +37,7 @@ build/src/vitastor-cli --etcd_address $ETCD_URL rm-osd --force 4
sleep 2 sleep 2
for i in {1..10}; do for i in {1..30}; do
($ETCDCTL get /vitastor/config/pgs --print-value-only |\ ($ETCDCTL get /vitastor/config/pgs --print-value-only |\
jq -s -e '([ .[0].items["1"] | map(.osd_set)[][] ] | sort | unique == ["1","2","3"])') && \ jq -s -e '([ .[0].items["1"] | map(.osd_set)[][] ] | sort | unique == ["1","2","3"])') && \
($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"] or .state == ["active", "left_on_dead"]) ] | length) == '$PG_COUNT'') && \ ($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"] or .state == ["active", "left_on_dead"]) ] | length) == '$PG_COUNT'') && \
@@ -50,6 +50,6 @@ if ! ($ETCDCTL get /vitastor/config/pgs --print-value-only |\
format_error "FAILED: OSD NOT REMOVED FROM DISTRIBUTION" format_error "FAILED: OSD NOT REMOVED FROM DISTRIBUTION"
fi fi
wait_finish_rebalance 10 wait_finish_rebalance 20
format_green OK format_green OK

View File

@@ -18,7 +18,7 @@ $ETCDCTL put /vitastor/config/pools '{"1":{"name":"testpool","scheme":"replicate
cd mon cd mon
npm install npm install
cd .. cd ..
node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" &>./testdata/mon.log & node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" --verbose 1 --restart_interval 5 &>./testdata/mon.log &
MON_PID=$! MON_PID=$!
sleep 2 sleep 2

View File

@@ -4,6 +4,8 @@ OSD_COUNT=7
PG_COUNT=32 PG_COUNT=32
. `dirname $0`/run_3osds.sh . `dirname $0`/run_3osds.sh
check_nbd
IMG_SIZE=256 IMG_SIZE=256
$ETCDCTL put /vitastor/config/inode/1/1 '{"name":"testimg","size":'$((IMG_SIZE*1024*1024))'}' $ETCDCTL put /vitastor/config/inode/1/1 '{"name":"testimg","size":'$((IMG_SIZE*1024*1024))'}'

View File

@@ -14,7 +14,7 @@ for i in $(seq 1 $OSD_COUNT); do
eval OSD${i}_PID=$! eval OSD${i}_PID=$!
done done
node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" &>./testdata/mon.log & node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" --verbose 1 --restart_interval 5 &>./testdata/mon.log &
MON_PID=$! MON_PID=$!
sleep 3 sleep 3