Compare commits
15 Commits
v0.8.8
...
mon-self-r
Author | SHA1 | Date | |
---|---|---|---|
d258a6e76b | |||
77155ab7bd | |||
a409598b16 | |||
f4c6765522 | |||
ad2916068a | |||
321cb435a6 | |||
cfcf4f4355 | |||
e0fb17bfee | |||
5b9031fecc | |||
5da1d8e1b5 | |||
44f86f1999 | |||
2d9a80c6f6 | |||
5e295e346e | |||
d9c0898b7c | |||
04cfb48361 |
@@ -17,14 +17,16 @@ Configuration parameters can be set in 3 places:
|
||||
- Configuration file (`/etc/vitastor/vitastor.conf` or other path)
|
||||
- etcd key `/vitastor/config/global`. Most variables can be set there, but etcd
|
||||
connection parameters should obviously be set in the configuration file.
|
||||
- Command line of Vitastor components: OSD, mon, fio and QEMU options,
|
||||
OpenStack/Proxmox/etc configuration. The latter doesn't allow to set all
|
||||
variables directly, but it allows to override the configuration file and
|
||||
set everything you need inside it.
|
||||
- Command line of Vitastor components: OSD (when you run it without vitastor-disk),
|
||||
mon, fio and QEMU options, OpenStack/Proxmox/etc configuration. The latter
|
||||
doesn't allow to set all variables directly, but it allows to override the
|
||||
configuration file and set everything you need inside it.
|
||||
- OSD superblocks created by [vitastor-disk](../usage/disk.en.md) contain
|
||||
primarily disk layout parameters of specific OSDs. In fact, these parameters
|
||||
are automatically passed into the command line of vitastor-osd process, so
|
||||
they have the same "status" as command-line parameters.
|
||||
|
||||
In the future, additional configuration methods may be added:
|
||||
- OSD superblock which will, by design, contain parameters related to the disk
|
||||
layout and to one specific OSD.
|
||||
- OSD-specific keys in etcd like `/vitastor/config/osd/<number>`.
|
||||
|
||||
## Parameter Reference
|
||||
|
@@ -19,14 +19,17 @@
|
||||
- Ключе в etcd `/vitastor/config/global`. Большая часть параметров может
|
||||
задаваться там, кроме, естественно, самих параметров соединения с etcd,
|
||||
которые должны задаваться в файле конфигурации
|
||||
- В командной строке компонентов Vitastor: OSD, монитора, опциях fio и QEMU,
|
||||
настроек OpenStack, Proxmox и т.п. Последние, как правило, не включают полный
|
||||
набор параметров напрямую, но разрешают определить путь к файлу конфигурации
|
||||
и задать любые параметры в нём.
|
||||
- В командной строке компонентов Vitastor: OSD (при ручном запуске без vitastor-disk),
|
||||
монитора, опциях fio и QEMU, настроек OpenStack, Proxmox и т.п. Последние,
|
||||
как правило, не включают полный набор параметров напрямую, но позволяют
|
||||
определить путь к файлу конфигурации и задать любые параметры в нём.
|
||||
- В суперблоке OSD, записываемом [vitastor-disk](../usage/disk.ru.md) - параметры,
|
||||
связанные с дисковым форматом и с этим конкретным OSD. На самом деле,
|
||||
при запуске OSD эти параметры автоматически передаются в командную строку
|
||||
процесса vitastor-osd, то есть по "статусу" они эквивалентны параметрам
|
||||
командной строки OSD.
|
||||
|
||||
В будущем также могут быть добавлены другие способы конфигурации:
|
||||
- Суперблок OSD, в котором будут храниться параметры OSD, связанные с дисковым
|
||||
форматом и с этим конкретным OSD.
|
||||
- OSD-специфичные ключи в etcd типа `/vitastor/config/osd/<номер>`.
|
||||
|
||||
## Список параметров
|
||||
|
@@ -6,10 +6,10 @@
|
||||
|
||||
# Proxmox VE
|
||||
|
||||
To enable Vitastor support in Proxmox Virtual Environment (6.4-7.3 are supported):
|
||||
To enable Vitastor support in Proxmox Virtual Environment (6.4-7.4 are supported):
|
||||
|
||||
- Add the corresponding Vitastor Debian repository into sources.list on Proxmox hosts:
|
||||
buster for 6.4, bullseye for 7.3, pve7.1 for 7.1, pve7.2 for 7.2
|
||||
buster for 6.4, bullseye for 7.4, pve7.1 for 7.1, pve7.2 for 7.2, pve7.3 for 7.3
|
||||
- Install vitastor-client, pve-qemu-kvm, pve-storage-vitastor (* or see note) packages from Vitastor repository
|
||||
- Define storage in `/etc/pve/storage.cfg` (see below)
|
||||
- Block network access from VMs to Vitastor network (to OSDs and etcd),
|
||||
|
@@ -6,10 +6,10 @@
|
||||
|
||||
# Proxmox
|
||||
|
||||
Чтобы подключить Vitastor к Proxmox Virtual Environment (поддерживаются версии 6.4-7.3):
|
||||
Чтобы подключить Vitastor к Proxmox Virtual Environment (поддерживаются версии 6.4-7.4):
|
||||
|
||||
- Добавьте соответствующий Debian-репозиторий Vitastor в sources.list на хостах Proxmox:
|
||||
buster для 6.4, bullseye для 7.3, pve7.1 для 7.1, pve7.2 для 7.2
|
||||
buster для 6.4, bullseye для 7.4, pve7.1 для 7.1, pve7.2 для 7.2, pve7.3 для 7.3
|
||||
- Установите пакеты vitastor-client, pve-qemu-kvm, pve-storage-vitastor (* или см. сноску) из репозитория Vitastor
|
||||
- Определите тип хранилища в `/etc/pve/storage.cfg` (см. ниже)
|
||||
- Обязательно заблокируйте доступ от виртуальных машин к сети Vitastor (OSD и etcd), т.к. Vitastor (пока) не поддерживает аутентификацию
|
||||
|
@@ -45,7 +45,9 @@ On the monitor hosts:
|
||||
}
|
||||
```
|
||||
- Initialize OSDs:
|
||||
- SSD-only: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]`
|
||||
- SSD-only: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]`. You can add
|
||||
`--disable_data_fsync off` to leave disk cache enabled if you use desktop
|
||||
SSDs without capacitors.
|
||||
- Hybrid, SSD+HDD: `vitastor-disk prepare --hybrid /dev/sdXXX [/dev/sdYYY ...]`.
|
||||
Pass all your devices (HDD and SSD) to this script — it will partition disks and initialize journals on its own.
|
||||
This script skips HDDs which are already partitioned so if you want to use non-empty disks for
|
||||
@@ -53,7 +55,9 @@ On the monitor hosts:
|
||||
but some free unpartitioned space must be available because the script creates new partitions for journals.
|
||||
- You can change OSD configuration in units or in `vitastor.conf`.
|
||||
Check [Configuration Reference](../config.en.md) for parameter descriptions.
|
||||
- If all your drives have capacitors, create global configuration in etcd: \
|
||||
- If all your drives have capacitors, and even if not, but if you ran `vitastor-disk`
|
||||
without `--disable_data_fsync off` at the first step, then put the following
|
||||
setting into etcd: \
|
||||
`etcdctl --endpoints=... put /vitastor/config/global '{"immediate_commit":"all"}'`
|
||||
- Start all OSDs: `systemctl start vitastor.target`
|
||||
|
||||
@@ -75,6 +79,10 @@ etcdctl --endpoints=... put /vitastor/config/pools '{"2":{"name":"ecpool",
|
||||
|
||||
After you do this, one of the monitors will configure PGs and OSDs will start them.
|
||||
|
||||
If you use HDDs you should also add `"block_size": 1048576` to pool configuration.
|
||||
The other option is to add it into /vitastor/config/global, in this case it will
|
||||
apply to all pools by default.
|
||||
|
||||
## Check cluster status
|
||||
|
||||
`vitastor-cli status`
|
||||
|
@@ -45,7 +45,9 @@
|
||||
}
|
||||
```
|
||||
- Инициализуйте OSD:
|
||||
- SSD: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]`
|
||||
- SSD: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]`. Если вы используете
|
||||
десктопные SSD без конденсаторов, можете оставить кэш включённым, добавив
|
||||
опцию `--disable_data_fsync off`.
|
||||
- Гибридные, SSD+HDD: `vitastor-disk prepare --hybrid /dev/sdXXX [/dev/sdYYY ...]`.
|
||||
Передайте все ваши SSD и HDD скрипту в командной строке подряд, скрипт автоматически выделит
|
||||
разделы под журналы на SSD и данные на HDD. Скрипт пропускает HDD, на которых уже есть разделы
|
||||
@@ -54,8 +56,11 @@
|
||||
для журналов, на SSD должно быть доступно свободное нераспределённое место.
|
||||
- Вы можете менять параметры OSD в юнитах systemd или в `vitastor.conf`. Описания параметров
|
||||
смотрите в [справке по конфигурации](../config.ru.md).
|
||||
- Если все ваши диски - серверные с конденсаторами, пропишите это в глобальную конфигурацию в etcd: \
|
||||
`etcdctl --endpoints=... put /vitastor/config/global '{"immediate_commit":"all"}'`
|
||||
- Если все ваши диски - серверные с конденсаторами, и даже если нет, но при этом
|
||||
вы не добавляли опцию `--disable_data_fsync off` на первом шаге, а `vitastor-disk`
|
||||
не ругался на невозможность отключения кэша дисков, пропишите следующую настройку
|
||||
в глобальную конфигурацию в etcd: \
|
||||
`etcdctl --endpoints=... put /vitastor/config/global '{"immediate_commit":"all"}'`.
|
||||
- Запустите все OSD: `systemctl start vitastor.target`
|
||||
|
||||
## Создайте пул
|
||||
@@ -76,6 +81,10 @@ etcdctl --endpoints=... put /vitastor/config/pools '{"2":{"name":"ecpool",
|
||||
|
||||
После этого один из мониторов должен сконфигурировать PG, а OSD должны запустить их.
|
||||
|
||||
Если вы используете HDD-диски, то добавьте в конфигурацию пулов опцию `"block_size": 1048576`.
|
||||
Также эту опцию можно добавить в /vitastor/config/global, в этом случае она будет
|
||||
применяться ко всем пулам по умолчанию.
|
||||
|
||||
## Проверьте состояние кластера
|
||||
|
||||
`vitastor-cli status`
|
||||
|
@@ -43,16 +43,16 @@ function finish_pg_history(merged_history)
|
||||
merged_history.all_peers = Object.values(merged_history.all_peers);
|
||||
}
|
||||
|
||||
function scale_pg_count(prev_pgs, prev_pg_history, new_pg_history, new_pg_count)
|
||||
function scale_pg_count(prev_pgs, real_prev_pgs, prev_pg_history, new_pg_history, new_pg_count)
|
||||
{
|
||||
const old_pg_count = prev_pgs.length;
|
||||
const old_pg_count = real_prev_pgs.length;
|
||||
// Add all possibly intersecting PGs to the history of new PGs
|
||||
if (!(new_pg_count % old_pg_count))
|
||||
{
|
||||
// New PG count is a multiple of old PG count
|
||||
for (let i = 0; i < new_pg_count; i++)
|
||||
{
|
||||
add_pg_history(new_pg_history, i, prev_pgs, prev_pg_history, i % old_pg_count);
|
||||
add_pg_history(new_pg_history, i, real_prev_pgs, prev_pg_history, i % old_pg_count);
|
||||
finish_pg_history(new_pg_history[i]);
|
||||
}
|
||||
}
|
||||
@@ -64,7 +64,7 @@ function scale_pg_count(prev_pgs, prev_pg_history, new_pg_history, new_pg_count)
|
||||
{
|
||||
for (let j = 0; j < mul; j++)
|
||||
{
|
||||
add_pg_history(new_pg_history, i, prev_pgs, prev_pg_history, i+j*new_pg_count);
|
||||
add_pg_history(new_pg_history, i, real_prev_pgs, prev_pg_history, i+j*new_pg_count);
|
||||
}
|
||||
finish_pg_history(new_pg_history[i]);
|
||||
}
|
||||
@@ -76,7 +76,7 @@ function scale_pg_count(prev_pgs, prev_pg_history, new_pg_history, new_pg_count)
|
||||
let merged_history = {};
|
||||
for (let i = 0; i < old_pg_count; i++)
|
||||
{
|
||||
add_pg_history(merged_history, 1, prev_pgs, prev_pg_history, i);
|
||||
add_pg_history(merged_history, 1, real_prev_pgs, prev_pg_history, i);
|
||||
}
|
||||
finish_pg_history(merged_history[1]);
|
||||
for (let i = 0; i < new_pg_count; i++)
|
||||
@@ -90,15 +90,15 @@ function scale_pg_count(prev_pgs, prev_pg_history, new_pg_history, new_pg_count)
|
||||
new_pg_history[i] = null;
|
||||
}
|
||||
// Just for the lp_solve optimizer - pick a "previous" PG for each "new" one
|
||||
if (old_pg_count < new_pg_count)
|
||||
if (prev_pgs.length < new_pg_count)
|
||||
{
|
||||
for (let i = old_pg_count; i < new_pg_count; i++)
|
||||
for (let i = prev_pgs.length; i < new_pg_count; i++)
|
||||
{
|
||||
prev_pgs[i] = prev_pgs[i % old_pg_count];
|
||||
prev_pgs[i] = prev_pgs[i % prev_pgs.length];
|
||||
}
|
||||
}
|
||||
else if (old_pg_count > new_pg_count)
|
||||
else if (prev_pgs.length > new_pg_count)
|
||||
{
|
||||
prev_pgs.splice(new_pg_count, old_pg_count-new_pg_count);
|
||||
prev_pgs.splice(new_pg_count, prev_pgs.length-new_pg_count);
|
||||
}
|
||||
}
|
||||
|
@@ -13,7 +13,7 @@ for (let i = 2; i < process.argv.length; i++)
|
||||
{
|
||||
console.error('USAGE: '+process.argv[0]+' '+process.argv[1]+' [--verbose 1]'+
|
||||
' [--etcd_address "http://127.0.0.1:2379,..."] [--config_path /etc/vitastor/vitastor.conf]'+
|
||||
' [--etcd_prefix "/vitastor"] [--etcd_start_timeout 5]');
|
||||
' [--etcd_prefix "/vitastor"] [--etcd_start_timeout 5] [--restart_interval 5]');
|
||||
process.exit();
|
||||
}
|
||||
else if (process.argv[i].substr(0, 2) == '--')
|
||||
|
75
mon/mon.js
75
mon/mon.js
@@ -561,7 +561,7 @@ class Mon
|
||||
}
|
||||
if (!this.ws)
|
||||
{
|
||||
this.die('Failed to open etcd watch websocket');
|
||||
await this.die('Failed to open etcd watch websocket');
|
||||
}
|
||||
const cur_addr = this.selected_etcd_url;
|
||||
this.ws_alive = true;
|
||||
@@ -728,7 +728,7 @@ class Mon
|
||||
const res = await this.etcd_call('/lease/keepalive', { ID: this.etcd_lease_id }, this.config.etcd_mon_timeout, this.config.etcd_mon_retries);
|
||||
if (!res.result.TTL)
|
||||
{
|
||||
this.die('Lease expired');
|
||||
await this.die('Lease expired');
|
||||
}
|
||||
}, this.config.etcd_mon_timeout);
|
||||
if (!this.signals_set)
|
||||
@@ -741,9 +741,32 @@ class Mon
|
||||
|
||||
async on_stop(status)
|
||||
{
|
||||
clearInterval(this.lease_timer);
|
||||
await this.etcd_call('/lease/revoke', { ID: this.etcd_lease_id }, this.config.etcd_mon_timeout, this.config.etcd_mon_retries);
|
||||
process.exit(status);
|
||||
if (this.ws_keepalive_timer)
|
||||
{
|
||||
clearInterval(this.ws_keepalive_timer);
|
||||
this.ws_keepalive_timer = null;
|
||||
}
|
||||
if (this.lease_timer)
|
||||
{
|
||||
clearInterval(this.lease_timer);
|
||||
this.lease_timer = null;
|
||||
}
|
||||
if (this.etcd_lease_id)
|
||||
{
|
||||
const lease_id = this.etcd_lease_id;
|
||||
this.etcd_lease_id = null;
|
||||
await this.etcd_call('/lease/revoke', { ID: lease_id }, this.config.etcd_mon_timeout, this.config.etcd_mon_retries);
|
||||
}
|
||||
if (!status || !this.initConfig.restart_interval)
|
||||
{
|
||||
process.exit(status);
|
||||
}
|
||||
else
|
||||
{
|
||||
console.log('Restarting after '+this.initConfig.restart_interval+' seconds');
|
||||
await new Promise(ok => setTimeout(ok, this.initConfig.restart_interval*1000));
|
||||
await this.start();
|
||||
}
|
||||
}
|
||||
|
||||
async become_master()
|
||||
@@ -956,7 +979,7 @@ class Mon
|
||||
return alive_set[this.rng() % alive_set.length];
|
||||
}
|
||||
|
||||
save_new_pgs_txn(request, pool_id, up_osds, osd_tree, prev_pgs, new_pgs, pg_history)
|
||||
save_new_pgs_txn(save_to, request, pool_id, up_osds, osd_tree, prev_pgs, new_pgs, pg_history)
|
||||
{
|
||||
const aff_osds = this.get_affinity_osds(this.state.config.pools[pool_id], up_osds, osd_tree);
|
||||
const pg_items = {};
|
||||
@@ -1009,14 +1032,14 @@ class Mon
|
||||
});
|
||||
}
|
||||
}
|
||||
this.state.config.pgs.items = this.state.config.pgs.items || {};
|
||||
save_to.items = save_to.items || {};
|
||||
if (!new_pgs.length)
|
||||
{
|
||||
delete this.state.config.pgs.items[pool_id];
|
||||
delete save_to.items[pool_id];
|
||||
}
|
||||
else
|
||||
{
|
||||
this.state.config.pgs.items[pool_id] = pg_items;
|
||||
save_to.items[pool_id] = pg_items;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1160,6 +1183,7 @@ class Mon
|
||||
if (this.state.config.pgs.hash != tree_hash)
|
||||
{
|
||||
// Something has changed
|
||||
const new_config_pgs = JSON.parse(JSON.stringify(this.state.config.pgs));
|
||||
const etcd_request = { compare: [], success: [] };
|
||||
for (const pool_id in (this.state.config.pgs||{}).items||{})
|
||||
{
|
||||
@@ -1180,7 +1204,7 @@ class Mon
|
||||
etcd_request.success.push({ requestDeleteRange: {
|
||||
key: b64(this.etcd_prefix+'/pool/stats/'+pool_id),
|
||||
} });
|
||||
this.save_new_pgs_txn(etcd_request, pool_id, up_osds, osd_tree, prev_pgs, [], []);
|
||||
this.save_new_pgs_txn(new_config_pgs, etcd_request, pool_id, up_osds, osd_tree, prev_pgs, [], []);
|
||||
}
|
||||
}
|
||||
for (const pool_id in this.state.config.pools)
|
||||
@@ -1234,7 +1258,7 @@ class Mon
|
||||
return;
|
||||
}
|
||||
const new_pg_history = [];
|
||||
PGUtil.scale_pg_count(prev_pgs, pg_history, new_pg_history, pool_cfg.pg_count);
|
||||
PGUtil.scale_pg_count(prev_pgs, real_prev_pgs, pg_history, new_pg_history, pool_cfg.pg_count);
|
||||
pg_history = new_pg_history;
|
||||
}
|
||||
for (const pg of prev_pgs)
|
||||
@@ -1287,14 +1311,15 @@ class Mon
|
||||
key: b64(this.etcd_prefix+'/pool/stats/'+pool_id),
|
||||
value: b64(JSON.stringify(this.state.pool.stats[pool_id])),
|
||||
} });
|
||||
this.save_new_pgs_txn(etcd_request, pool_id, up_osds, osd_tree, real_prev_pgs, optimize_result.int_pgs, pg_history);
|
||||
this.save_new_pgs_txn(new_config_pgs, etcd_request, pool_id, up_osds, osd_tree, real_prev_pgs, optimize_result.int_pgs, pg_history);
|
||||
}
|
||||
this.state.config.pgs.hash = tree_hash;
|
||||
await this.save_pg_config(etcd_request);
|
||||
new_config_pgs.hash = tree_hash;
|
||||
await this.save_pg_config(new_config_pgs, etcd_request);
|
||||
}
|
||||
else
|
||||
{
|
||||
// Nothing changed, but we still want to recheck the distribution of primaries
|
||||
let new_config_pgs;
|
||||
let changed = false;
|
||||
for (const pool_id in this.state.config.pools)
|
||||
{
|
||||
@@ -1314,31 +1339,35 @@ class Mon
|
||||
const new_primary = this.pick_primary(pool_id, pg_cfg.osd_set, up_osds, aff_osds);
|
||||
if (pg_cfg.primary != new_primary)
|
||||
{
|
||||
if (!new_config_pgs)
|
||||
{
|
||||
new_config_pgs = JSON.parse(JSON.stringify(this.state.config.pgs));
|
||||
}
|
||||
console.log(
|
||||
`Moving pool ${pool_id} (${pool_cfg.name || 'unnamed'}) PG ${pg_num}`+
|
||||
` primary OSD from ${pg_cfg.primary} to ${new_primary}`
|
||||
);
|
||||
changed = true;
|
||||
pg_cfg.primary = new_primary;
|
||||
new_config_pgs.items[pool_id][pg_num].primary = new_primary;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if (changed)
|
||||
{
|
||||
await this.save_pg_config();
|
||||
await this.save_pg_config(new_config_pgs);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async save_pg_config(etcd_request = { compare: [], success: [] })
|
||||
async save_pg_config(new_config_pgs, etcd_request = { compare: [], success: [] })
|
||||
{
|
||||
etcd_request.compare.push(
|
||||
{ key: b64(this.etcd_prefix+'/mon/master'), target: 'LEASE', lease: ''+this.etcd_lease_id },
|
||||
{ key: b64(this.etcd_prefix+'/config/pgs'), target: 'MOD', mod_revision: ''+this.etcd_watch_revision, result: 'LESS' },
|
||||
);
|
||||
etcd_request.success.push(
|
||||
{ requestPut: { key: b64(this.etcd_prefix+'/config/pgs'), value: b64(JSON.stringify(this.state.config.pgs)) } },
|
||||
{ requestPut: { key: b64(this.etcd_prefix+'/config/pgs'), value: b64(JSON.stringify(new_config_pgs)) } },
|
||||
);
|
||||
const res = await this.etcd_call('/kv/txn', etcd_request, this.config.etcd_mon_timeout, 0);
|
||||
if (!res.succeeded)
|
||||
@@ -1765,14 +1794,13 @@ class Mon
|
||||
return res.json;
|
||||
}
|
||||
}
|
||||
this.die();
|
||||
await this.die();
|
||||
}
|
||||
|
||||
_die(err)
|
||||
async _die(err)
|
||||
{
|
||||
// In fact we can just try to rejoin
|
||||
console.error(new Error(err || 'Cluster connection failed'));
|
||||
process.exit(1);
|
||||
await this.on_stop(1);
|
||||
}
|
||||
|
||||
local_ips(all)
|
||||
@@ -1817,6 +1845,7 @@ function POST(url, body, timeout)
|
||||
clearTimeout(timer_id);
|
||||
let res_body = '';
|
||||
res.setEncoding('utf8');
|
||||
res.on('error', no);
|
||||
res.on('data', chunk => { res_body += chunk; });
|
||||
res.on('end', () =>
|
||||
{
|
||||
@@ -1836,6 +1865,8 @@ function POST(url, body, timeout)
|
||||
}
|
||||
});
|
||||
});
|
||||
req.on('error', no);
|
||||
req.on('close', () => no(new Error('Connection closed prematurely')));
|
||||
req.write(body_text);
|
||||
req.end();
|
||||
});
|
||||
|
@@ -15,4 +15,4 @@ StartLimitInterval=0
|
||||
RestartSec=10
|
||||
|
||||
[Install]
|
||||
WantedBy=vitastor.target
|
||||
WantedBy=multi-user.target
|
||||
|
@@ -307,6 +307,18 @@ void blockstore_impl_t::check_wait(blockstore_op_t *op)
|
||||
}
|
||||
PRIV(op)->wait_for = 0;
|
||||
}
|
||||
else if (PRIV(op)->wait_for == WAIT_FREE)
|
||||
{
|
||||
if (!data_alloc->get_free_count() && big_to_flush > 0)
|
||||
{
|
||||
#ifdef BLOCKSTORE_DEBUG
|
||||
printf("Still waiting for free space on the data device\n");
|
||||
#endif
|
||||
return;
|
||||
}
|
||||
flusher->release_trim();
|
||||
PRIV(op)->wait_for = 0;
|
||||
}
|
||||
else
|
||||
{
|
||||
throw std::runtime_error("BUG: op->wait_for value is unexpected");
|
||||
|
@@ -160,6 +160,8 @@ struct __attribute__((__packed__)) dirty_entry
|
||||
#define WAIT_JOURNAL 3
|
||||
// Suspend operation until the next journal sector buffer is free
|
||||
#define WAIT_JOURNAL_BUFFER 4
|
||||
// Suspend operation until there is some free space on the data device
|
||||
#define WAIT_FREE 5
|
||||
|
||||
struct fulfill_read_t
|
||||
{
|
||||
@@ -263,6 +265,7 @@ class blockstore_impl_t
|
||||
|
||||
struct journal_t journal;
|
||||
journal_flusher_t *flusher;
|
||||
int big_to_flush = 0;
|
||||
int write_iodepth = 0;
|
||||
|
||||
bool live = false, queue_stall = false;
|
||||
|
@@ -201,6 +201,11 @@ void blockstore_impl_t::erase_dirty(blockstore_dirty_db_t::iterator dirty_start,
|
||||
}
|
||||
while (1)
|
||||
{
|
||||
if ((IS_BIG_WRITE(dirty_it->second.state) || IS_DELETE(dirty_it->second.state)) &&
|
||||
IS_STABLE(dirty_it->second.state))
|
||||
{
|
||||
big_to_flush--;
|
||||
}
|
||||
if (IS_BIG_WRITE(dirty_it->second.state) && dirty_it->second.location != clean_loc &&
|
||||
dirty_it->second.location != UINT64_MAX)
|
||||
{
|
||||
|
@@ -446,6 +446,7 @@ void blockstore_impl_t::mark_stable(const obj_ver_id & v, bool forget_dirty)
|
||||
{
|
||||
inode_space_stats[dirty_it->first.oid.inode] += dsk.data_block_size;
|
||||
}
|
||||
big_to_flush++;
|
||||
}
|
||||
else if (IS_DELETE(dirty_it->second.state))
|
||||
{
|
||||
@@ -454,6 +455,7 @@ void blockstore_impl_t::mark_stable(const obj_ver_id & v, bool forget_dirty)
|
||||
sp -= dsk.data_block_size;
|
||||
else
|
||||
inode_space_stats.erase(dirty_it->first.oid.inode);
|
||||
big_to_flush++;
|
||||
}
|
||||
}
|
||||
if (forget_dirty && (IS_BIG_WRITE(dirty_it->second.state) ||
|
||||
|
@@ -271,6 +271,13 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
|
||||
if (loc == UINT64_MAX)
|
||||
{
|
||||
// no space
|
||||
if (big_to_flush > 0)
|
||||
{
|
||||
// hope that some space will be available after flush
|
||||
flusher->request_trim();
|
||||
PRIV(op)->wait_for = WAIT_FREE;
|
||||
return 0;
|
||||
}
|
||||
cancel_all_writes(op, dirty_it, -ENOSPC);
|
||||
return 2;
|
||||
}
|
||||
|
@@ -54,6 +54,13 @@ void epoll_manager_t::set_fd_handler(int fd, bool wr, std::function<void(int, in
|
||||
ev.events = (wr ? EPOLLOUT : 0) | EPOLLIN | EPOLLRDHUP | EPOLLET;
|
||||
if (epoll_ctl(epoll_fd, exists ? EPOLL_CTL_MOD : EPOLL_CTL_ADD, fd, &ev) < 0)
|
||||
{
|
||||
if (errno == ENOENT)
|
||||
{
|
||||
// The FD is probably already closed
|
||||
epoll_ctl(epoll_fd, EPOLL_CTL_DEL, fd, NULL);
|
||||
epoll_handlers.erase(fd);
|
||||
return;
|
||||
}
|
||||
throw std::runtime_error(std::string("epoll_ctl: ") + strerror(errno));
|
||||
}
|
||||
epoll_handlers[fd] = handler;
|
||||
|
@@ -191,7 +191,7 @@ struct __attribute__((__packed__)) osd_op_rw_t
|
||||
uint64_t inode;
|
||||
// offset
|
||||
uint64_t offset;
|
||||
// length
|
||||
// length. 0 means to read all bitmaps of the specified range, but no data.
|
||||
uint32_t len;
|
||||
// flags (for future)
|
||||
uint32_t flags;
|
||||
|
@@ -186,10 +186,22 @@ void osd_t::continue_primary_read(osd_op_t *cur_op)
|
||||
cur_op->reply.rw.bitmap_len = 0;
|
||||
{
|
||||
auto & pg = pgs.at({ .pool_id = INODE_POOL(op_data->oid.inode), .pg_num = op_data->pg_num });
|
||||
for (int role = 0; role < op_data->pg_data_size; role++)
|
||||
if (cur_op->req.rw.len == 0)
|
||||
{
|
||||
op_data->stripes[role].read_start = op_data->stripes[role].req_start;
|
||||
op_data->stripes[role].read_end = op_data->stripes[role].req_end;
|
||||
// len=0 => bitmap read
|
||||
for (int role = 0; role < op_data->pg_data_size; role++)
|
||||
{
|
||||
op_data->stripes[role].read_start = 0;
|
||||
op_data->stripes[role].read_end = UINT32_MAX;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
for (int role = 0; role < op_data->pg_data_size; role++)
|
||||
{
|
||||
op_data->stripes[role].read_start = op_data->stripes[role].req_start;
|
||||
op_data->stripes[role].read_end = op_data->stripes[role].req_end;
|
||||
}
|
||||
}
|
||||
// Determine version
|
||||
auto vo_it = pg.ver_override.find(op_data->oid);
|
||||
|
@@ -151,6 +151,13 @@ int osd_t::submit_primary_subop_batch(int submit_type, inode_t inode, uint64_t o
|
||||
{
|
||||
int stripe_num = rep ? 0 : role;
|
||||
osd_op_t *subop = op_data->subops + i;
|
||||
uint32_t subop_len = wr
|
||||
? stripes[stripe_num].write_end - stripes[stripe_num].write_start
|
||||
: stripes[stripe_num].read_end - stripes[stripe_num].read_start;
|
||||
if (!wr && stripes[stripe_num].read_end == UINT32_MAX)
|
||||
{
|
||||
subop_len = 0;
|
||||
}
|
||||
if (role_osd_num == this->osd_num)
|
||||
{
|
||||
clock_gettime(CLOCK_REALTIME, &subop->tv_begin);
|
||||
@@ -169,7 +176,7 @@ int osd_t::submit_primary_subop_batch(int submit_type, inode_t inode, uint64_t o
|
||||
},
|
||||
.version = op_version,
|
||||
.offset = wr ? stripes[stripe_num].write_start : stripes[stripe_num].read_start,
|
||||
.len = wr ? stripes[stripe_num].write_end - stripes[stripe_num].write_start : stripes[stripe_num].read_end - stripes[stripe_num].read_start,
|
||||
.len = subop_len,
|
||||
.buf = wr ? stripes[stripe_num].write_buf : stripes[stripe_num].read_buf,
|
||||
.bitmap = stripes[stripe_num].bmp_buf,
|
||||
});
|
||||
@@ -199,7 +206,7 @@ int osd_t::submit_primary_subop_batch(int submit_type, inode_t inode, uint64_t o
|
||||
},
|
||||
.version = op_version,
|
||||
.offset = wr ? stripes[stripe_num].write_start : stripes[stripe_num].read_start,
|
||||
.len = wr ? stripes[stripe_num].write_end - stripes[stripe_num].write_start : stripes[stripe_num].read_end - stripes[stripe_num].read_start,
|
||||
.len = subop_len,
|
||||
.attr_len = wr ? clean_entry_bitmap_size : 0,
|
||||
};
|
||||
#ifdef OSD_DEBUG
|
||||
@@ -218,9 +225,9 @@ int osd_t::submit_primary_subop_batch(int submit_type, inode_t inode, uint64_t o
|
||||
}
|
||||
else
|
||||
{
|
||||
if (stripes[stripe_num].read_end > stripes[stripe_num].read_start)
|
||||
if (subop_len > 0)
|
||||
{
|
||||
subop->iov.push_back(stripes[stripe_num].read_buf, stripes[stripe_num].read_end - stripes[stripe_num].read_start);
|
||||
subop->iov.push_back(stripes[stripe_num].read_buf, subop_len);
|
||||
}
|
||||
}
|
||||
subop->callback = [cur_op, this](osd_op_t *subop)
|
||||
|
@@ -28,7 +28,9 @@ static inline void extend_read(uint32_t start, uint32_t end, osd_rmw_stripe_t &
|
||||
}
|
||||
else
|
||||
{
|
||||
if (stripe.read_end < end)
|
||||
if (stripe.read_end < end && end != UINT32_MAX ||
|
||||
// UINT32_MAX means that stripe only needs bitmap, end != 0 => needs also data
|
||||
stripe.read_end == UINT32_MAX && end != 0)
|
||||
stripe.read_end = end;
|
||||
if (stripe.read_start > start)
|
||||
stripe.read_start = start;
|
||||
@@ -105,24 +107,30 @@ void reconstruct_stripes_xor(osd_rmw_stripe_t *stripes, int pg_size, uint32_t bi
|
||||
}
|
||||
else if (prev >= 0)
|
||||
{
|
||||
assert(stripes[role].read_start >= stripes[prev].read_start &&
|
||||
stripes[role].read_start >= stripes[other].read_start);
|
||||
memxor(
|
||||
(uint8_t*)stripes[prev].read_buf + (stripes[role].read_start - stripes[prev].read_start),
|
||||
(uint8_t*)stripes[other].read_buf + (stripes[role].read_start - stripes[other].read_start),
|
||||
stripes[role].read_buf, stripes[role].read_end - stripes[role].read_start
|
||||
);
|
||||
if (stripes[role].read_end != UINT32_MAX)
|
||||
{
|
||||
assert(stripes[role].read_start >= stripes[prev].read_start &&
|
||||
stripes[role].read_start >= stripes[other].read_start);
|
||||
memxor(
|
||||
(uint8_t*)stripes[prev].read_buf + (stripes[role].read_start - stripes[prev].read_start),
|
||||
(uint8_t*)stripes[other].read_buf + (stripes[role].read_start - stripes[other].read_start),
|
||||
stripes[role].read_buf, stripes[role].read_end - stripes[role].read_start
|
||||
);
|
||||
}
|
||||
memxor(stripes[prev].bmp_buf, stripes[other].bmp_buf, stripes[role].bmp_buf, bitmap_size);
|
||||
prev = -1;
|
||||
}
|
||||
else
|
||||
{
|
||||
assert(stripes[role].read_start >= stripes[other].read_start);
|
||||
memxor(
|
||||
stripes[role].read_buf,
|
||||
(uint8_t*)stripes[other].read_buf + (stripes[role].read_start - stripes[other].read_start),
|
||||
stripes[role].read_buf, stripes[role].read_end - stripes[role].read_start
|
||||
);
|
||||
if (stripes[role].read_end != UINT32_MAX)
|
||||
{
|
||||
assert(stripes[role].read_start >= stripes[other].read_start);
|
||||
memxor(
|
||||
stripes[role].read_buf,
|
||||
(uint8_t*)stripes[other].read_buf + (stripes[role].read_start - stripes[other].read_start),
|
||||
stripes[role].read_buf, stripes[role].read_end - stripes[role].read_start
|
||||
);
|
||||
}
|
||||
memxor(stripes[role].bmp_buf, stripes[other].bmp_buf, stripes[role].bmp_buf, bitmap_size);
|
||||
}
|
||||
}
|
||||
@@ -356,20 +364,23 @@ void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsi
|
||||
uint64_t read_start = 0, read_end = 0;
|
||||
auto recover_seq = [&]()
|
||||
{
|
||||
int orig = 0;
|
||||
for (int other = 0; other < pg_size; other++)
|
||||
if (read_end != UINT32_MAX)
|
||||
{
|
||||
if (stripes[other].read_end != 0 && !stripes[other].missing)
|
||||
int orig = 0;
|
||||
for (int other = 0; other < pg_size && orig < pg_minsize; other++)
|
||||
{
|
||||
assert(stripes[other].read_start <= read_start);
|
||||
assert(stripes[other].read_end >= read_end);
|
||||
data_ptrs[orig++] = (uint8_t*)stripes[other].read_buf + (read_start - stripes[other].read_start);
|
||||
if (stripes[other].read_end != 0 && !stripes[other].missing)
|
||||
{
|
||||
assert(stripes[other].read_start <= read_start);
|
||||
assert(stripes[other].read_end >= read_end);
|
||||
data_ptrs[orig++] = (uint8_t*)stripes[other].read_buf + (read_start - stripes[other].read_start);
|
||||
}
|
||||
}
|
||||
ec_encode_data(
|
||||
read_end-read_start, pg_minsize, wanted, dectable + wanted_base*32*pg_minsize,
|
||||
data_ptrs, data_ptrs + pg_minsize
|
||||
);
|
||||
}
|
||||
ec_encode_data(
|
||||
read_end-read_start, pg_minsize, wanted, dectable + wanted_base*32*pg_minsize,
|
||||
data_ptrs, data_ptrs + pg_minsize
|
||||
);
|
||||
wanted_base += wanted;
|
||||
wanted = 0;
|
||||
};
|
||||
@@ -391,6 +402,32 @@ void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsi
|
||||
{
|
||||
recover_seq();
|
||||
}
|
||||
// Recover bitmaps
|
||||
if (bitmap_size > 0)
|
||||
{
|
||||
for (int role = 0; role < pg_minsize; role++)
|
||||
{
|
||||
if (stripes[role].read_end != 0 && stripes[role].missing)
|
||||
{
|
||||
data_ptrs[pg_minsize + (wanted++)] = (uint8_t*)stripes[role].bmp_buf;
|
||||
}
|
||||
}
|
||||
if (wanted > 0)
|
||||
{
|
||||
int orig = 0;
|
||||
for (int other = 0; other < pg_size && orig < pg_minsize; other++)
|
||||
{
|
||||
if (stripes[other].read_end != 0 && !stripes[other].missing)
|
||||
{
|
||||
data_ptrs[orig++] = (uint8_t*)stripes[other].bmp_buf;
|
||||
}
|
||||
}
|
||||
ec_encode_data(
|
||||
bitmap_size, pg_minsize, wanted, dectable,
|
||||
data_ptrs, data_ptrs + pg_minsize
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
#else
|
||||
void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsize, uint32_t bitmap_size)
|
||||
@@ -412,7 +449,8 @@ void reconstruct_stripes_ec(osd_rmw_stripe_t *stripes, int pg_size, int pg_minsi
|
||||
if (stripes[role].read_end != 0 && stripes[role].missing)
|
||||
{
|
||||
recovered = true;
|
||||
if (stripes[role].read_end > stripes[role].read_start)
|
||||
if (stripes[role].read_end > stripes[role].read_start &&
|
||||
stripes[role].read_end != UINT32_MAX)
|
||||
{
|
||||
for (int other = 0; other < pg_size; other++)
|
||||
{
|
||||
@@ -531,7 +569,8 @@ void* alloc_read_buffer(osd_rmw_stripe_t *stripes, int read_pg_size, uint64_t ad
|
||||
uint64_t buf_size = add_size;
|
||||
for (int role = 0; role < read_pg_size; role++)
|
||||
{
|
||||
if (stripes[role].read_end != 0)
|
||||
if (stripes[role].read_end != 0 &&
|
||||
stripes[role].read_end != UINT32_MAX)
|
||||
{
|
||||
buf_size += stripes[role].read_end - stripes[role].read_start;
|
||||
}
|
||||
@@ -541,7 +580,8 @@ void* alloc_read_buffer(osd_rmw_stripe_t *stripes, int read_pg_size, uint64_t ad
|
||||
uint64_t buf_pos = add_size;
|
||||
for (int role = 0; role < read_pg_size; role++)
|
||||
{
|
||||
if (stripes[role].read_end != 0)
|
||||
if (stripes[role].read_end != 0 &&
|
||||
stripes[role].read_end != UINT32_MAX)
|
||||
{
|
||||
stripes[role].read_buf = (uint8_t*)buf + buf_pos;
|
||||
buf_pos += stripes[role].read_end - stripes[role].read_start;
|
||||
|
@@ -23,6 +23,7 @@ struct osd_rmw_stripe_t
|
||||
void *read_buf, *write_buf;
|
||||
void *bmp_buf;
|
||||
uint32_t req_start, req_end;
|
||||
// read_end=UINT32_MAX means to only read bitmap, but not data
|
||||
uint32_t read_start, read_end;
|
||||
uint32_t write_start, write_end;
|
||||
bool missing;
|
||||
|
@@ -27,6 +27,7 @@ void test13();
|
||||
void test14();
|
||||
void test15(bool second);
|
||||
void test16();
|
||||
void test_recover_22_d2();
|
||||
|
||||
int main(int narg, char *args[])
|
||||
{
|
||||
@@ -61,6 +62,8 @@ int main(int narg, char *args[])
|
||||
test15(true);
|
||||
// Test 16
|
||||
test16();
|
||||
// Test 17
|
||||
test_recover_22_d2();
|
||||
// End
|
||||
printf("all ok\n");
|
||||
return 0;
|
||||
@@ -1045,7 +1048,12 @@ void test16()
|
||||
assert(stripes[3].read_buf == (uint8_t*)read_buf+2*128*1024);
|
||||
set_pattern(stripes[1].read_buf, 128*1024, PATTERN2);
|
||||
memcpy(stripes[3].read_buf, rmw_buf, 128*1024);
|
||||
memset(stripes[0].bmp_buf, 0xa8, bmp);
|
||||
memset(stripes[2].bmp_buf, 0xb7, bmp);
|
||||
assert(bitmaps[1] == 0xFFFFFFFF);
|
||||
assert(bitmaps[3] == 0xF1F1F1F1);
|
||||
reconstruct_stripes_ec(stripes, 4, 2, bmp);
|
||||
assert(*(uint32_t*)stripes[3].bmp_buf == 0xF1F1F1F1);
|
||||
assert(bitmaps[0] == 0xFFFFFFFF);
|
||||
check_pattern(stripes[0].read_buf, 128*1024, PATTERN1);
|
||||
free(read_buf);
|
||||
@@ -1054,3 +1062,47 @@ void test16()
|
||||
free(write_buf);
|
||||
use_ec(4, 2, false);
|
||||
}
|
||||
|
||||
/***
|
||||
|
||||
17. EC 2+2 recover second data block
|
||||
|
||||
***/
|
||||
|
||||
void test_recover_22_d2()
|
||||
{
|
||||
const int bmp = 128*1024 / 4096 / 8;
|
||||
use_ec(4, 2, true);
|
||||
osd_num_t osd_set[4] = { 1, 0, 3, 4 };
|
||||
osd_rmw_stripe_t stripes[4] = {};
|
||||
unsigned bitmaps[4] = { 0 };
|
||||
// Read 0-256K
|
||||
split_stripes(2, 128*1024, 0, 256*1024, stripes);
|
||||
assert(stripes[0].req_start == 0 && stripes[0].req_end == 128*1024);
|
||||
assert(stripes[1].req_start == 0 && stripes[1].req_end == 128*1024);
|
||||
assert(stripes[2].req_start == 0 && stripes[2].req_end == 0);
|
||||
assert(stripes[3].req_start == 0 && stripes[3].req_end == 0);
|
||||
uint8_t *data_buf = (uint8_t*)malloc_or_die(128*1024*4);
|
||||
for (int i = 0; i < 4; i++)
|
||||
{
|
||||
stripes[i].read_start = stripes[i].req_start;
|
||||
stripes[i].read_end = stripes[i].req_end;
|
||||
stripes[i].read_buf = data_buf + i*128*1024;
|
||||
stripes[i].bmp_buf = bitmaps + i;
|
||||
}
|
||||
// Read using parity
|
||||
assert(extend_missing_stripes(stripes, osd_set, 2, 4) == 0);
|
||||
assert(stripes[2].read_start == 0 && stripes[2].read_end == 128*1024);
|
||||
assert(stripes[3].read_start == 0 && stripes[3].read_end == 0);
|
||||
bitmaps[0] = 0xffffffff;
|
||||
bitmaps[2] = 0;
|
||||
set_pattern(stripes[0].read_buf, 128*1024, PATTERN1);
|
||||
set_pattern(stripes[2].read_buf, 128*1024, PATTERN1^PATTERN2);
|
||||
// Reconstruct
|
||||
reconstruct_stripes_ec(stripes, 4, 2, bmp);
|
||||
check_pattern(stripes[1].read_buf, 128*1024, PATTERN2);
|
||||
assert(bitmaps[1] == 0xFFFFFFFF);
|
||||
free(data_buf);
|
||||
// Done
|
||||
use_ec(4, 2, false);
|
||||
}
|
||||
|
@@ -36,12 +36,12 @@ for i in $(seq 2 $ETCD_COUNT); do
|
||||
ETCD_URL="$ETCD_URL,http://$ETCD_IP:$((ETCD_PORT+2*i-2))"
|
||||
ETCD_CLUSTER="$ETCD_CLUSTER,etcd$i=http://$ETCD_IP:$((ETCD_PORT+2*i-1))"
|
||||
done
|
||||
ETCDCTL="${ETCD}ctl --endpoints=$ETCD_URL"
|
||||
ETCDCTL="${ETCD}ctl --endpoints=$ETCD_URL --dial-timeout=5s --command-timeout=10s"
|
||||
|
||||
start_etcd()
|
||||
{
|
||||
local i=$1
|
||||
$ETCD -name etcd$i --data-dir ./testdata/etcd$i \
|
||||
ionice -c2 -n0 $ETCD -name etcd$i --data-dir ./testdata/etcd$i \
|
||||
--advertise-client-urls http://$ETCD_IP:$((ETCD_PORT+2*i-2)) --listen-client-urls http://$ETCD_IP:$((ETCD_PORT+2*i-2)) \
|
||||
--initial-advertise-peer-urls http://$ETCD_IP:$((ETCD_PORT+2*i-1)) --listen-peer-urls http://$ETCD_IP:$((ETCD_PORT+2*i-1)) \
|
||||
--initial-cluster-token vitastor-tests-etcd --initial-cluster-state new \
|
||||
@@ -53,8 +53,11 @@ start_etcd()
|
||||
for i in $(seq 1 $ETCD_COUNT); do
|
||||
start_etcd $i
|
||||
done
|
||||
if [ $ETCD_COUNT -gt 1 ]; then
|
||||
sleep 1
|
||||
for i in {1..10}; do
|
||||
${ETCD}ctl --endpoints=$ETCD_URL --dial-timeout=1s --command-timeout=1s member list >/dev/null && break
|
||||
done
|
||||
if [[ $i = 10 ]]; then
|
||||
format_error "Failed to start etcd"
|
||||
fi
|
||||
|
||||
echo leak:fio >> testdata/lsan-suppress.txt
|
||||
|
@@ -39,7 +39,7 @@ done
|
||||
cd mon
|
||||
npm install
|
||||
cd ..
|
||||
node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" --verbose 1 &>./testdata/mon.log &
|
||||
node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" --verbose 1 --restart_interval 5 &>./testdata/mon.log &
|
||||
MON_PID=$!
|
||||
|
||||
if [ "$SCHEME" = "ec" ]; then
|
||||
@@ -100,13 +100,13 @@ wait_finish_rebalance()
|
||||
sec=$1
|
||||
i=0
|
||||
while [[ $i -lt $sec ]]; do
|
||||
($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"]) ] | length) == 32') && \
|
||||
($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"] or .state == ["active", "left_on_dead"]) ] | length) == '$PG_COUNT) && \
|
||||
break
|
||||
if [ $i -eq 60 ]; then
|
||||
format_error "Rebalance couldn't finish in $sec seconds"
|
||||
fi
|
||||
sleep 1
|
||||
i=$((i+1))
|
||||
if [ $i -eq $sec ]; then
|
||||
format_error "Rebalance couldn't finish in $sec seconds"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
@@ -117,3 +117,14 @@ check_qemu()
|
||||
sudo ln -s "$(realpath .)/build/src/block-vitastor.so" /usr/lib/x86_64-linux-gnu/qemu/block-vitastor.so
|
||||
fi
|
||||
}
|
||||
|
||||
check_nbd()
|
||||
{
|
||||
if [[ -d /sys/module/nbd && ! -e /dev/nbd0 ]]; then
|
||||
max_part=$(cat /sys/module/nbd/parameters/max_part)
|
||||
nbds_max=$(cat /sys/module/nbd/parameters/nbds_max)
|
||||
for i in $(seq 1 $nbds_max); do
|
||||
mknod /dev/nbd$((i-1)) b 43 $(((i-1)*(max_part+1)))
|
||||
done
|
||||
fi
|
||||
}
|
||||
|
@@ -15,10 +15,10 @@ done
|
||||
|
||||
sleep 2
|
||||
|
||||
for i in {1..10}; do
|
||||
for i in {1..30}; do
|
||||
($ETCDCTL get /vitastor/config/pgs --print-value-only |\
|
||||
jq -s -e '([ .[0].items["1"] | map(.osd_set)[][] ] | sort | unique == ["1","2","3","4"])') && \
|
||||
($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"]) ] | length) == '$PG_COUNT'') && \
|
||||
($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"]) ] | length) == '$PG_COUNT) && \
|
||||
break
|
||||
sleep 1
|
||||
done
|
||||
@@ -28,7 +28,7 @@ if ! ($ETCDCTL get /vitastor/config/pgs --print-value-only |\
|
||||
format_error "FAILED: OSD NOT ADDED INTO DISTRIBUTION"
|
||||
fi
|
||||
|
||||
wait_finish_rebalance 10
|
||||
wait_finish_rebalance 20
|
||||
|
||||
sleep 1
|
||||
kill -9 $OSD4_PID
|
||||
@@ -37,7 +37,7 @@ build/src/vitastor-cli --etcd_address $ETCD_URL rm-osd --force 4
|
||||
|
||||
sleep 2
|
||||
|
||||
for i in {1..10}; do
|
||||
for i in {1..30}; do
|
||||
($ETCDCTL get /vitastor/config/pgs --print-value-only |\
|
||||
jq -s -e '([ .[0].items["1"] | map(.osd_set)[][] ] | sort | unique == ["1","2","3"])') && \
|
||||
($ETCDCTL get --prefix /vitastor/pg/state/ --print-value-only | jq -s -e '([ .[] | select(.state == ["active"] or .state == ["active", "left_on_dead"]) ] | length) == '$PG_COUNT'') && \
|
||||
@@ -50,6 +50,6 @@ if ! ($ETCDCTL get /vitastor/config/pgs --print-value-only |\
|
||||
format_error "FAILED: OSD NOT REMOVED FROM DISTRIBUTION"
|
||||
fi
|
||||
|
||||
wait_finish_rebalance 10
|
||||
wait_finish_rebalance 20
|
||||
|
||||
format_green OK
|
||||
|
@@ -18,7 +18,7 @@ $ETCDCTL put /vitastor/config/pools '{"1":{"name":"testpool","scheme":"replicate
|
||||
cd mon
|
||||
npm install
|
||||
cd ..
|
||||
node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" &>./testdata/mon.log &
|
||||
node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" --verbose 1 --restart_interval 5 &>./testdata/mon.log &
|
||||
MON_PID=$!
|
||||
|
||||
sleep 2
|
||||
|
@@ -4,6 +4,8 @@ OSD_COUNT=7
|
||||
PG_COUNT=32
|
||||
. `dirname $0`/run_3osds.sh
|
||||
|
||||
check_nbd
|
||||
|
||||
IMG_SIZE=256
|
||||
|
||||
$ETCDCTL put /vitastor/config/inode/1/1 '{"name":"testimg","size":'$((IMG_SIZE*1024*1024))'}'
|
||||
|
@@ -14,7 +14,7 @@ for i in $(seq 1 $OSD_COUNT); do
|
||||
eval OSD${i}_PID=$!
|
||||
done
|
||||
|
||||
node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" &>./testdata/mon.log &
|
||||
node mon/mon-main.js --etcd_url $ETCD_URL --etcd_prefix "/vitastor" --verbose 1 --restart_interval 5 &>./testdata/mon.log &
|
||||
MON_PID=$!
|
||||
|
||||
sleep 3
|
||||
|
Reference in New Issue
Block a user