Release 0.8.1

- Remove an additional data copy operation when flushing journal (should slightly increase write performance) - Fix a bug where new writes in the inmemory_journal=false mode could overwrite the data currently read by a parallel read operation - Fix degraded parity writes for EC N+K when K>1 where the bug could also lead to an "assertion failed" error - Fix missing journal space check for "big" writes which could lead to "prefill_single_journal_entry(): assertion failed..." error in OSD - Fix possible "assertion failed: next->prev_wait >= 0" in client in rare cases - Fix missing "len" field in vitastor-disk write-journal big_writes - Fix possible crash of a full OSD (ENOSPC) - Fix CSI build scripts to include newest packages every time - Fix CSI endpoint in the liveness probe manifest
Make journal trimmer wait until reads are completed when inmemory_journal is false
2022-11-20 11:44:09 +03:00 · 2022-11-20 01:49:21 +03:00 · 2022-11-20 00:50:13 +03:00 · 2022-11-20 00:50:13 +03:00 · 2022-11-20 00:50:13 +03:00 · 2022-11-20 00:50:13 +03:00
130 changed files with 6708 additions and 2256 deletions
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -2,6 +2,6 @@ cmake_minimum_required(VERSION 2.8)

 project(vitastor)

-set(VERSION "0.7.1")
+set(VERSION "0.8.1")

 add_subdirectory(src)
--- a/README-ru.md
+++ b/README-ru.md
@@ -58,6 +58,7 @@ Vitastor поддерживает QEMU-драйвер, протоколы NBD и
  - [Метаданные образов в etcd](docs/config/inode.ru.md)
 - Использование
  - [vitastor-cli](docs/usage/cli.ru.md) (консольный интерфейс)
+  - [vitastor-disk](docs/usage/disk.ru.md) (управление дисками)
  - [fio](docs/usage/fio.ru.md) для тестов производительности
  - [NBD](docs/usage/nbd.ru.md) для монтирования ядром
  - [QEMU и qemu-img](docs/usage/qemu.ru.md)
@@ -96,5 +97,5 @@ Vitastor Network Public License 1.1, основанная на GNU GPL 3.0 с д
 и также на условиях GNU GPL 2.0 или более поздней версии. Так сделано в целях
 совместимости с таким ПО, как QEMU и fio.

-Вы можете найти полный текст VNPL 1.1 в файле [VNPL-1.1.txt](VNPL-1.1.txt),
-а GPL 2.0 в файле [GPL-2.0.txt](GPL-2.0.txt).
+Вы можете найти полный текст VNPL 1.1 на английском языке в файле [VNPL-1.1.txt](VNPL-1.1.txt),
+VNPL 1.1 на русском языке в файле [VNPL-1.1-RU.txt](VNPL-1.1-RU.txt), а GPL 2.0 в файле [GPL-2.0.txt](GPL-2.0.txt).
--- a/README.md
+++ b/README.md
@@ -58,6 +58,7 @@ Read more details below in the documentation.
  - [Image metadata in etcd](docs/config/inode.en.md)
 - Usage
  - [vitastor-cli](docs/usage/cli.en.md) (command-line interface)
+  - [vitastor-disk](docs/usage/disk.en.md) (disk management tool)
  - [fio](docs/usage/fio.en.md) for benchmarks
  - [NBD](docs/usage/nbd.en.md) for kernel mounts
  - [QEMU and qemu-img](docs/usage/qemu.en.md)
--- a/VNPL-1.1-RU.txt
+++ b/VNPL-1.1-RU.txt
@@ -0,0 +1,680 @@
+                СЕТЕВАЯ ПУБЛИЧНАЯ ЛИЦЕНЗИЯ VITASTOR
+                  VITASTOR NETWORK PUBLIC LICENSE
+                   Версия 1.1, от 6 февраля 2021
+
+ Автор лицензии: Виталий Филиппов <vitalif@yourcmc.ru>, 2021 год
+ Каждый имеет право копировать и распространять точные копии этой
+ лицензии, но без внесения изменений.
+
+                            ПРЕАМБУЛА
+
+  Сетевая Публичная Лицензия Vitastor - это свободная "копилефт" лицензия для
+для программного обеспечения (ПО) и других видов произведений, специально
+разработанная, чтобы гарантировать кооперацию с сообществом при разработке
+сетевых приложений.
+
+  Большинство лицензий на программное обеспечение и другие произведения
+спроектированы так, чтобы лишить Вас свободы делиться ими и изменять их.
+Сетевая Публичная Лицензия Vitastor, напротив, разработана с целью
+гарантировать Ваше право распространять и вносить изменения во все версии
+программного обеспечения -- для уверенности, что ПО останется свободным для
+всех пользователей.
+
+  Когда мы говорим о свободном ПО, мы имеем в виду свободу использования, а не
+бесплатность. Свободные лицензии, такие, как Сетевая Публичная Лицензия
+Vitastor, составлены для того, чтобы убедиться, что у Вас есть право
+распространять копии свободного ПО (и взимать плату за них, если Вы хотите),
+что Вы получаете исходные тексты или можете получить их, если захотите, что Вы
+можете изменять программное обеспечение или использовать его части в новых
+свободных программах, и что Вы знаете о своем праве делать всё это.
+
+  Разработчики, использующие Сетевую Публичную Лицензию Vitastor, гарантируют
+Ваши права при помощи следующих мер: (1) закрепляют авторское право на
+программное обеспечение, и (2) предлагают Вам принять условия настоящей
+Лицензии, закрепляющей Ваше право на создание копий, распространение и (или)
+модификацию программного обеспечения.
+
+  Еще одно преимущество защиты свободы всех пользователей заключается в том,
+что улучшения, сделанные в разных версиях программы, при их широком
+распространении становятся доступными для использования другими разработчиками.
+Многие разработчики программного обеспечения воодушевляются этим
+сотрудничеством и пользуются его преимуществами. Однако, если программное
+обеспечение используется на сетевых серверах, данный результат не всегда
+достигается. Генеральная публичная лицензия GNU разрешает создание измененных
+версий и предоставление неограниченного доступа к ним, не делая общедоступным
+их исходный текст. Даже генеральная публичная лицензия GNU Affero разрешает
+использование модифицированной версии свободной программы в закрытой среде, где
+внешние пользователи взаимодействуют с ней только через закрытый промежуточный
+интерфейс (прокси), опять же, без открытия в свободный публичный доступ как
+самой программы, так и прокси.
+
+  Сетевая Публичная Лицензия Vitastor разработана специально чтобы
+гарантировать, что в таких случаях и модифицированная версия программы, и
+прокси оставались доступными сообществу. Для этого лицензия требует от
+операторов сетевых серверов предоставлять исходный код оригинальной программы,
+а также всех других программ, взаимодействующих с ней на их серверах,
+пользователям этих серверов, на условиях свободных лицензий. Таким образом,
+публичное использование изменённой версии ПО на сервере, прямо или косвенно
+доступном пользователям, даёт пользователям доступ к исходным кодам изменённой
+версии.
+
+  Детальные определения используемых терминов и описание условий копирования,
+распространения и внесения изменений приведены ниже.
+
+                        ТЕРМИНЫ И УСЛОВИЯ
+
+  0. Определения.
+
+  "Настоящая Лицензия" -- версия 1.1 Сетевой Публичной Лицензии Vitastor.
+
+  Под "Авторским правом" понимаются все законы, сходные с авторско-правовыми,
+которые применяются к любым видам работ, например, к топологиям микросхем.
+
+  Термином "Программа" обозначается любое охраноспособное произведение,
+используемое в соответствии с настоящей Лицензией. Лицензиат именуется "Вы".
+"Лицензиаты" и "получатели" могут быть как физическими лицами, так и
+организациями.
+
+  "Внесение изменений" в произведение означает копирование или адаптацию
+произведения целиком или в части, способом, требующим разрешения
+правообладателя, за исключением изготовления его точной копии. Получившееся
+произведение называется "измененной версией" предыдущего произведения или
+произведением, "основанным на" более ранней работе.
+
+  Термином "Лицензионное произведение" обозначается неизмененная Программа или
+произведение, основанное на Программе.
+
+  "Распространение" произведения означает совершение с ним действий, которые
+при отсутствии разрешения сделают Вас прямо или косвенно ответственным за
+нарушение действующего закона об авторском праве, за исключением запуска на
+компьютере или изменения копии, созданной в личных целях. Распространение
+включает в себя копирование, раздачу копий (с изменениями или без них),
+доведение до всеобщего сведения, а в некоторых странах -- и другие действия.
+
+  "Передача" произведения означает любой вид распространения, который позволяет
+другим лицам создавать или получать копии произведения. Обычное взаимодействие
+с пользователем через компьютерную сеть без создания копии передачей не
+является.
+
+  Интерактивный интерфейс пользователя должен отображать "Информация об
+авторском праве", достаточную для того, чтобы (1) обеспечить отображение
+соответствующего уведомления об авторских правах и (2) сообщить пользователю
+о том, что ему не предоставляются никакие гарантии на произведение (за
+исключением явным образом предоставленных гарантий), о том, что лицензиаты
+могут передавать произведение на условиях, описанных в настоящей Лицензии,
+а также о том, как ознакомиться с текстом настоящей Лицензии. Если интерфейс
+предоставляет собой список пользовательских команд или настроек, наподобие
+меню, это требование считается выполненным при наличии явно выделенного
+пункта в таком меню.
+
+  1. Исходный текст.
+
+  Под "Исходным текстом" понимается произведение в форме, которая более всего
+подходит для внесения в него изменений. "Объектным кодом" называется
+произведение в любой иной форме.
+
+  "Стандартный интерфейс" -- интерфейс, который либо является общепринятым
+стандартом, введенным общепризнанным органом по стандартизации, либо, в случае
+интерфейсов, характерных для конкретного языка программирования -- тот,
+который широко используется разработчиками, пишущими программы на этом языке.
+
+  "Системные библиотеки" исполняемого произведения включают в себя то, что не
+относится к произведению в целом и при этом (a) входит в обычный комплект
+Основного компонента, но при этом не является его частью и (b) служит только
+для обеспечения работы с этим Основным компонентом или для реализации
+Стандартного интерфейса, для которого существует общедоступная реализация,
+опубликованная в виде исходного текста. "Основным компонентом" в данном
+контексте назван главный существенный компонент (ядро, оконная система и т.д.)
+определенной операционной системы (если она используется), под управлением
+которой функционирует исполняемое произведение, либо компилятор, используемый
+для создания произведения или интерпретатор объектного кода, используемый для
+его запуска.
+
+  "Полный исходный текст" для произведения в форме объектного кода -- весь
+исходный текст, необходимый для создания, установки и (для исполняемого
+произведения) функционирования объектного кода, а также модификации
+произведения, включая сценарии, контролирующие эти действия. Однако он не
+включает в себя Системные библиотеки, необходимые для функционирования
+произведения, инструменты общего назначения или общедоступные свободные
+программы, которые используются в неизменном виде для выполнения этих
+действий, но не являются частью произведения. Полный исходный текст включает
+в себя, например, файлы описания интерфейса, прилагаемые к файлам исходного
+текста произведения, а также исходные тексты общих библиотек и динамически
+связанных подпрограмм, которые требуются для функционирования произведения
+и разработаны специально для этого, например, для прямой передачи данных
+или управления потоками между этими подпрограммами и другими частями
+произведения. Полный исходный текст не включает в себя то, что пользователи
+могут сгенерировать автоматически из других частей Полного исходного текста.
+Полным исходным текстом для произведения в форме исходных текстов является
+само это произведение.
+
+  2. Основные права.
+
+  Все права, предоставленные на основании настоящей Лицензии, действуют в
+течение срока действия авторских прав на Программу и не могут быть отозваны
+при условии, что сформулированные в ней условия соблюдены. Настоящая Лицензия
+однозначно подтверждает Ваши неограниченные права на запуск неизмененной
+Программы. Настоящая Лицензия распространяется на результаты функционирования
+Лицензионного произведения только в том случае, если они, учитывая их
+содержание, сами являются частью Лицензионного произведения. Настоящая
+Лицензия подтверждает Ваши права на  свободное использование произведения
+или другие аналогичные полномочия, предусмотренные действующим
+законодательством об авторском праве.
+
+  Если Вы не осуществляете обычную передачу Лицензионного произведения, то
+можете как угодно создавать, запускать и распространять его копии до тех пор,
+пока ваша Лицензия сохраняет силу. Вы можете передавать Лицензионные
+произведения третьим лицам исключительно для того, чтобы они внесли в них
+изменения для Вас или предоставили Вам возможность их запуска, при условии,
+что Вы соглашаетесь с условиями настоящей Лицензии при передаче всех
+материалов, авторскими правами на которые Вы не обладаете. Лица, создающие
+или запускающие Лицензионные произведения для Вас, должны делать это
+исключительно от Вашего имени, под Вашим руководством и контролем, на
+условиях, которые запрещают им создание без Вашей санкции каких-либо копий
+материалов, на которые Вы обладаете авторским правом.
+
+  Любая другая передача разрешается исключительно при соблюдении описанных
+ниже условий. Сублицензирование не допускается; раздел 10 делает его не нужным.
+
+  3. Защита прав пользователей от законов, запрещающих обход технических средств.
+
+  Ни одно Лицензионное произведение не должно считаться содержащим эффективные
+технические средства, удовлетворяющие требованиям любого действующего закона,
+принятого для исполнения обязательств, предусмотренных статьей 11 Договора ВОИС
+по авторскому праву от 20 декабря 1996 года или аналогичных законов,
+запрещающих или ограничивающих обход таких технических средств.
+
+  При передаче Лицензионного произведения Вы отказываетесь от всех
+предоставляемых законом полномочий по запрету обхода технических средств,
+используемых авторами в связи с осуществлением их прав, признавая, что такой
+обход находится в рамках осуществления прав на использование Лицензионного
+произведения, предоставленных настоящей Лицензией; также Вы отказываетесь
+от любых попыток ограничить функционирование произведения или внесение в него
+изменений, направленных на реализацию предоставленных Вам законом прав на
+запрет пользователю обхода технических средств.
+
+  4. Передача неизмененных копий.
+
+  Вы можете передавать точные копии исходных текстов Программы в том виде,
+в котором Вы их получили, на любом носителе, при условии, что Вы прилагаете
+к каждой копии соответствующее уведомление об авторских правах способом,
+обеспечивающим ознакомление с ним пользователя; сохраняете все уведомления
+о том, что к тексту применима настоящая Лицензия и любые ограничения,
+добавленные в соответствии с разделом 7; сохраняете все уведомления об
+отсутствии каких-либо гарантий; предоставляете всем получателям вместе с
+Программой копию настоящей Лицензии.
+
+  Вы можете установить любую цену за каждую копию, которую Вы передаете,
+или распространять копии бесплатно; также Вы можете предложить поддержку
+или гарантию за отдельную плату.
+
+  5. Передача измененных исходных текстов.
+
+  Вы можете передавать исходный текст произведения, основанного на Программе,
+или изменений, необходимых для того, чтобы получить его из Программы, на
+условиях, описанных в разделе 4, при соблюдении следующих условий:
+
+    а) Произведение должно содержать уведомления о произведенных Вами
+    изменениях с указанием их даты, сделанные способом, обеспечивающим
+    ознакомление с ними пользователя.
+
+    b) Произведение должно содержать уведомление о том, что оно
+    распространяется на условиях настоящей Лицензии, а также об условиях,
+    добавленных в соответствии с разделом 7, сделанное способом,
+    обеспечивающим ознакомление с ним пользователя. Данное требование имеет
+    приоритет над требованиями раздела 4 "оставлять нетронутыми все
+    уведомления".
+
+    c) Вы должны передать на условиях настоящей Лицензии всю работу целиком
+    любому лицу, которое приобретает копию. Таким образом, настоящая Лицензия
+    вместе с любыми применимыми условиями раздела 7 будет применяться к
+    произведению в целом и всем его частям, независимо от их комплектности.
+    Настоящая Лицензия не дает права на лицензирование произведения на любых
+    других условиях, но это не лишает законной силы такое разрешение, если Вы
+    получили его отдельно.
+
+    d) Если произведение имеет интерактивные пользовательские интерфейсы,
+    каждый из них должен отображать Информацию об авторском праве; однако,
+    если Программа имеет пользовательские интерфейсы, которые не отображают
+    информацию об авторском праве, от Вашего произведения этого также не
+    требуется.
+
+  Включение Лицензионного произведения в подборку на разделе хранения данных
+или на носителе, на котором распространяется произведение, вместе с другими
+отдельными самостоятельными произведениями, которые по своей природе не
+являются переработкой Лицензионного произведения и не объединены с ним,
+например, в программный комплекс, называется "набором", если авторские права
+на подборку не используются для ограничения доступа к ней или законных прав
+её пользователей сверх того, что предусматривают лицензии на отдельные
+произведения. Включение Лицензионного произведения в набор не влечет применения
+положений настоящей Лицензии к остальным его частям.
+
+  6. Передача произведения в формах, не относящихся к исходному тексту.
+
+  Вы можете передавать Лицензионное произведение в виде объектного кода в
+соответствии с положениями разделов 4 и 5, при условии, что Вы также передаете
+машиночитаемый Полный исходный текст в соответствии с условиями настоящей
+Лицензии, одним из следующих способов:
+
+    а) Передавая объектный код или содержащий его материальный продукт (включая
+    распределенный материальный носитель), с приложением Полного исходного
+    текста наматериальном носителе, обычно используемом для обмена программным
+    обеспечением.
+
+    b) Передавая объектный код или содержащий его материальный продукт (включая
+    носитель, на котором распространяется произведение), с письменным
+    предложением, действительным в течение не менее трех лет либо до тех пор,
+    пока Вы предоставляете запасные части или поддержку для данного продукта,
+    о передаче любому обладателю объектного кода (1) копии Полного исходного
+    текста для всего программного обеспечения, содержащегося в продукте, на
+    которое распространяется действие настоящей Лицензии, на физическом
+    носителе, обычно используемом для обмена программным обеспечением, по цене,
+    не превышающей разумных затрат на передачу копии, или (2) доступа к Полному
+    исходному тексту с возможностью его копирования с сетевого сервера без
+    взимания платы.
+
+    с) Передавая отдельные копии объектного кода с письменной копией предложения
+    о предоставлении Полного исходного текста. Этот вариант допускается только
+    в отдельных случаях при распространении без извлечения прибыли, и только
+    если Вы получили объектный код с таким предложением в соответствии
+    с пунктом 6b.
+
+    d) Передавая объектный код посредством предоставления доступа к нему по
+    определенному адресу (бесплатно или за дополнительную плату), и предлагая
+    эквивалентный доступ к Полному исходному тексту таким же способом по тому же
+    адресу без какой-либо дополнительной оплаты. От Вас не требуется принуждать
+    получателей копировать Полный исходный текст вместе с объектным кодом. Если
+    объектный код размещен на сетевом сервере, Полный исходный текст может
+    находиться на другом сервере (управляемом Вами или третьим лицом), который
+    предоставляет аналогичную возможность копирования; при этом Вы должны четко
+    указать рядом с объектным кодом способ получения Полного исходного текста.
+    Независимо от того, на каком сервере расположен Полный исходный текст, Вы
+    обязаны убедиться в том, что он будет распространяться в течение времени,
+    необходимого для соблюдения этих требований.
+
+    e) Передавая объектный код с использованием одноранговой (пиринговой) сети,
+    при условии информирования других пользователей сети о том, где можно
+    бесплатно получить объектный код и Полный исходный текст произведения
+    способом, описанным в пункте 6d.
+
+  Не нужно включать в передаваемый объектный код его отделимые части, исходные
+тексты которых не входят в состав Полного исходного текста, такие как Системные
+библиотеки.
+
+  "Потребительский товар" это либо (1) "товар, предназначенный для личных нужд",
+под которым понимается любое материальное личное имущество, которое обычно
+используется для личных, семейных или домашних целей, или (2) что-либо
+спроектированное или продающееся для использования в жилище. При определении
+того, предназначен ли товар для личных нужд, сомнения должны толковаться в
+пользу положительного ответа на этот вопрос. Применительно к конкретному
+товару, используемому конкретным пользователем, под выражением "обычно
+используется" имеется в виду способ, которым данный вид товаров преимущественно
+или как правило используется, независимо от статуса конкретного пользователя
+или способа, которым конкретный пользователь использует, предполагает или
+будет использовать товар. Товар относится к предназначенным для личных нужд
+независимо от того, насколько часто он используется в коммерческой
+деятельности, промышленности или иной сфере, не относящейся к личным нуждам,
+за исключением случая, когда использование в этой сфере представляет собой
+единственный основной способ использования такого товара.
+
+  "Информация, необходимая для установки" Потребительского товара -- любые
+методы, процедуры, сведения, необходимые для авторизации, или другая
+информация, необходимая для установки и запуска в Потребительском товаре
+измененных версий Лицензионного произведения, полученных при изменении
+Полного исходного текста. Данная информация должна быть достаточной для
+того, чтобы обеспечить возможность внесения в исходный текст изменений,
+не приводящих к ограничению или нарушению его дальнейшей работоспособности.
+
+  Если вместе с Потребительским товаром или специально для использования
+в нём Вы передаете произведение в виде объектного кода на условиях, описанных
+в данном разделе, и такая передача является частью сделки, по которой право
+владения и пользования Потребительским товаром переходит к получателю
+пожизненно или на определенный срок (независимо от признаков сделки), Полный
+исходный текст, передаваемый согласно данному разделу, должен сопровождаться
+Информацией, необходимой для установки. Но это требование не применяется,
+если ни Вы, ни какое-либо третье лицо не сохраняет за собой возможности
+установки измененного объектного кода на Потребительский товар (например,
+произведение было установлено в постоянную память).
+
+  Требование о предоставлении Информации, необходимой для установки, не
+включает в себя требование продолжения оказания услуг по поддержке,
+предоставления гарантии или обновлений для произведения, которое было изменено
+или установлено получателем, либо для Потребительского товара, в котором оно
+было изменено или на который оно было установлено. В доступе к сети может быть
+отказано, если само внесение изменений существенно и негативно влияет на
+работу сети, нарушает правила обмена данными или не поддерживает протоколы для
+обмена данными по сети.
+
+  Передаваемый в соответствии с данным разделом Полный исходный текст и
+предоставленная Информация, необходимая для установки, должны быть записаны в
+формате, который имеет общедоступное описание (и общедоступную реализацию,
+опубликованную в форме исходного текста) и не должны требовать никаких
+специальных паролей или ключей для распаковки, чтения или копирования.
+
+  7. Дополнительные условия.
+
+  "Дополнительными разрешениями" называются условия, которые дополняют условия
+настоящей Лицензии, делая исключения из одного или нескольких её положений.
+Дополнительные разрешения, которые применимы ко всей Программе, должны
+рассматриваться как часть настоящей Лицензии, в той степени, в которой они
+соответствуют действующему законодательству. Если дополнительные разрешения
+применяются только к части Программы, эта часть может быть использована отдельно
+на измененных условиях, но вся Программа продолжает использоваться на условиях
+настоящей Лицензии без учета дополнительных разрешений.
+
+  Когда Вы передаете копию Лицензионного произведения, Вы можете по своему
+усмотрению исключить любые дополнительные разрешения, примененные к этой копии
+или к любой её части. (Для дополнительных разрешений может быть заявлено
+требование об их удалении в определенных случаях, когда Вы вносите изменения в
+произведение.) Вы можете добавлять дополнительные разрешения к добавленным Вами
+в Лицензионное произведение материалам, на которые Вы обладаете авторскими
+правами или правом выдачи соответствующего разрешения.
+
+  Независимо от любых других положений настоящей Лицензии, Вы можете дополнить
+следующими условиями положения настоящей Лицензии в отношении материала,
+добавленного к Лицензионному произведению (если это разрешено обладателями
+авторских прав на материал):
+
+    a) отказом от гарантий или ограничением ответственности, отличающимися от
+    тех, что описаны в разделах 15 и 16 настоящей Лицензии; либо
+
+    b) требованием сохранения соответствующей информации о правах или об
+    авторстве материала, или включения её в Информацию об авторском праве,
+    отображаемую содержащим его произведением; либо
+
+    c) запретом на искажение информации об источнике происхождения материала
+    или требованием того, чтобы измененные версии такого материала содержали
+    корректную отметку об отличиях от исходной версии; либо
+
+    d) ограничением использования в целях рекламы имен лицензиаров или авторов
+    материала; либо
+
+    e) отказом от предоставления прав на использование в качестве товарных
+    знаков некоторых торговых наименований, товарных знаков или знаков
+    обслуживания; либо
+
+    f) требованием от каждого, кто по договору передает материал (или его
+    измененные версии), предоставления компенсации лицензиарам и авторам
+    материала в виде принятия на себя любой ответственности, которую этот
+    договор налагает на лицензиаров и авторов.
+
+  Все остальные ограничительные дополнительные условия считаются "дополнительными
+запретами" по смыслу раздела 10. Если программа, которую Вы получили, или любая
+её часть содержит уведомление о том, что наряду с настоящей Лицензией её
+использование регулируется условием, относящимся к дополнительным запретам, Вы
+можете удалить такое условие. Если лицензия содержит дополнительный запрет, но
+допускает лицензирование на измененных условиях или передачу в соответствии с
+настоящей Лицензией, Вы можете добавить к Лицензионному произведению материал,
+используемый на условиях такой лицензии, в том случае, если дополнительный
+запрет не сохраняется при таком изменении условий лицензии или передаче.
+
+  Если Вы добавляете условия для использования Лицензионного произведения в
+соответствии с настоящим разделом, Вы должны поместить в соответствующих файлах
+исходного текста уведомление о том, что к этим файлам применяются дополнительные
+условия, или указание на то, как ознакомиться с соответствующими условиями.
+
+  Дополнительные разрешающие или ограничивающие условия могут быть сформулированы
+в виде отдельной лицензии или зафиксированы как исключения; вышеуказанные
+требования применяются в любом случае.
+
+  8. Прекращение действия.
+
+  Вы не можете распространять Лицензионное произведение или вносить в него
+изменения на условиях, отличающихся от явно оговоренных в настоящей Лицензии.
+Любая попытка распространения или внесения изменений на иных условиях является
+ничтожной и автоматически прекращает Ваши права, полученные по настоящей
+Лицензии (включая лицензию на любые патенты, предоставленные согласно третьему
+пункту раздела 11).
+
+  Тем не менее если Вы прекращаете нарушение настоящей Лицензии, Ваши права,
+полученные от конкретного правообладателя, восстанавливаются (а) временно, до
+тех пор пока правообладатель явно и окончательно не прекратит действие Ваших
+прав, и (б) навсегда, если правообладатель не уведомит Вас о нарушении с помощью
+надлежащих средств в течение 60 дней после прекращения нарушений.
+
+  Кроме того, Ваши права, полученные от конкретного правообладателя,
+восстанавливаются навсегда, если правообладатель впервые любым подходящим
+способом уведомляет Вас о нарушении настоящей Лицензии на свое произведение (для
+любого произведения) и Вы устраняете нарушение в течение 30 дней после получения
+уведомления.
+
+  Прекращение Ваших прав, описанное в настоящем разделе, не прекращает действие
+лицензий лиц, которые получили от Вас копии произведения или права,
+предоставляемые настоящей Лицензией. Если Ваши права были прекращены навсегда и
+не восстановлены, Вы не можете вновь получить право на тот же материал на
+условиях, описанных в разделе 10.
+
+  9. Акцепт не требуется для получения копий.
+
+  Вы не обязаны принимать условия настоящей Лицензии для того, чтобы получить или
+запустить копию Программы. Случайное распространение Лицензионного произведения,
+происходящее вследствие использования одноранговой (пиринговой) сети для
+получения его копии, также не требует принятия этих условий. Тем не менее только
+настоящая Лицензия дает Вам право распространять или изменять любое Лицензионное
+произведение. Если Вы не приняли условия настоящей Лицензии, такие действия
+будут нарушением авторского права. Поэтому изменяя или распространяя
+Лицензионное произведение, Вы выражаете согласие с условиями настоящей Лицензии.
+
+  10. Автоматическое получение прав последующими получателями.
+
+  Каждый раз, когда Вы передаете Лицензионное произведение, получатель
+автоматически получает от его лицензиара право запускать, изменять и
+распространять это произведение при условии соблюдения настоящей Лицензии. Вы не
+несете ответственности за соблюдение третьими лицами условий настоящей Лицензии.
+
+  "Реорганизацией" называются действия, в результате которых передается управление
+организацией или значительная часть её активов, а также происходит разделение
+или слияние организаций. Если распространение Лицензионного произведения
+является результатом реорганизации, каждая из сторон сделки, получающая копию
+произведения, также получает все права на произведение, которые предшествующее
+юридическое лицо имело или могло предоставить согласно предыдущему абзацу, а
+также право на владение Полным исходным текстом произведения от предшественника,
+осуществляемое в его интересах, если предшественник владеет им или может
+получить его при разумных усилиях.
+
+  Вы не можете налагать каких-либо дополнительных ограничений на осуществление
+прав, предоставленных или подтвержденных в соответствии с настоящей Лицензией.
+Например, Вы не можете ставить осуществление прав, предоставленных по настоящей
+Лицензии, в зависимость от оплаты отчислений, роялти или других сборов; также Вы
+не можете инициировать судебный процесс (включая встречный иск или заявление
+встречного требования в судебном процессе) о нарушении любых патентных прав при
+создании, использовании, продаже, предложении продажи, импорте Программы или
+любой её части.
+
+  11. Патенты.
+
+  "Инвестором" называется правообладатель, разрешающий использование Программы
+либо произведения, на котором основана Программа, на условиях настоящей
+Лицензии. Произведение, лицензированное таким образом, называется "версией со
+вкладом" инвестора.
+
+  "Неотъемлемые патентные претензии" инвестора -- все патентные права,
+принадлежащие инвестору или контролируемые им в настоящее время либо
+приобретенные в будущем, которые могут быть нарушены созданием, использованием
+или продажей версии со вкладом, допускаемыми настоящей Лицензией; они не
+включают в себя права, которые будут нарушены исключительно вследствие будущих
+изменений версии со вкладом. Для целей данного определения под "контролем"
+понимается право выдавать патентные сублицензии способами, не нарушающими
+требований настоящей Лицензии.
+
+  Каждый инвестор предоставляет Вам неисключительную безвозмездную лицензию на
+патент, действующую во всем мире, соответствующую неотъемлемым патентным
+претензиям инвестора, на создание, использование, продажу, предложение для
+продажи, импорт, а также запуск, внесение изменений и распространение всего, что
+входит в состав версии со вкладом.
+
+  В следующих трех абзацах "лицензией на патент" называется любое явно выраженное
+вовне согласие или обязательство не применять патент (например, выдача
+разрешения на использование запатентованного объекта или обещание не подавать в
+суд за нарушение патента). "Выдать" кому-то такую лицензию на патент означает
+заключить такое соглашение или обязаться не применять патент против него.
+
+  Если Вы передаете Лицензионное произведение, сознательно основываясь на лицензии
+на патент, в то время как Полный исходный текст произведения невозможно
+бесплатно скопировать с общедоступного сервера или другим не вызывающим
+затруднений способом, Вы должны либо (1) обеспечить возможность такого доступа к
+Полному исходному тексту, либо (2) отказаться от прав, предоставленных по
+лицензии на патент для данного произведения, либо (3) принять меры по передаче
+лицензии на патент последующим получателям произведения, в соответствии с
+требованиями настоящей Лицензии. "Сознательно основываясь" означает, что Вы
+знаете, что при отсутствии лицензии на патент передача Вами Лицензионного
+произведения в определенной стране или использование получателем переданного ему
+Вами Лицензионного произведения в этой стране нарушит один или несколько
+определенных патентов этой страны, срок действия которых не истек.
+
+  Если в соответствии или в связи с единичной сделкой либо соглашением Вы
+передаете или делаете заказ на распространение Лицензионного произведения, и
+предоставляете определенным лицам, получающим Лицензионное произведение,
+лицензию на патент, разрешающую им использовать, распространять, вносить
+изменения или передавать конкретные экземпляры Лицензионного произведения,
+права, которые Вы предоставляете по лицензии на патент, автоматически переходят
+ко всем получателям Лицензионного произведения и произведений, созданных на его
+основе.
+
+  Патентная лицензия называется "дискриминирующей", если она не покрывает,
+запрещает осуществление или содержит в качестве условия отказ от применения
+одного или нескольких прав, предоставленных настоящей Лицензией. Вы не можете
+передавать Лицензионное произведение, если Вы являетесь участником договора с
+третьим лицом, осуществляющим распространение программного обеспечения, в
+соответствии с которым Вы делаете в пользу третьего лица выплаты, размер которых
+зависит от масштабов Вашей деятельности по передаче произведения, и в
+соответствии с которым любое третье лицо, получающее от Вас Лицензионное
+произведение, делает это на условиях дискриминирующей патентной лицензии (а)
+которая зависит от количества копий Лицензионного произведения, переданных Вами
+(или копий, сделанных с этих копий), или (b) которая используется
+преимущественно в конкретных товарах или подборках, содержащих Лицензионное
+произведение, или в связи с ними, в том случае, если Вы заключили данный договор
+или получили лицензию на патент после 28 марта 2007 года.
+
+  Ничто в настоящей Лицензии не должно толковаться как исключение или ограничение
+любого предполагаемого права или других способов противодействия нарушениям,
+которые во всем остальном могут быть доступны для Вас в соответствии с
+применимым патентным правом.
+
+  12. Запрет отказывать в свободе другим.
+
+  Если на Вас наложены обязанности (будь то по решению суда, договору или иным
+способом), которые противоречат условиям настоящей Лицензии, это не освобождает
+Вас от соблюдения её условий. Если Вы не можете передать Лицензионное
+произведение так, чтобы одновременно выполнять Ваши обязательства по настоящей
+Лицензии и любые другие относящиеся к делу обязательства, то Вы не можете
+передавать его вообще. Например, если Вы согласны с условием, обязывающими Вас
+производить сбор отчислений за дальнейшую передачу от тех, кому Вы передаете
+Программу, то для того, чтобы соблюсти это условие и выполнить требования
+настоящей Лицензии, Вы должны полностью воздержаться от передачи Программы.
+
+  13. Удаленное сетевое взаимодействие.
+
+  Под "Прокси-программой" понимается отдельная программа, специально
+разработанная для использования совместно с Лицензионным произведением,
+и взаимодействующая с ним прямо или косвенно через любой вид программного
+интерфейса, компьютерную сеть, имитацию такой сети, или, в свою очередь,
+через другую Прокси-программу.
+
+  Независимо от любых других положений настоящей Лицензии, если вы
+предоставляете любому пользователю возможность взаимодействовать с Лицензионным
+произведением через компьютерную сеть, имитацию такой сети, или через любое
+количество "Прокси-программ", вы должны в явной форме предложить этому
+пользователю возможность получить Полный исходный текст Лицензионного
+произведения и всех Прокси-программ путём предоставления доступа к нему
+с сетевого сервера без взимания платы, посредством стандартных или
+традиционных способов, используемых для копирования программного обеспечения.
+Полный исходный текст Лицензионного произведения должен предоставляться
+пользователю на условиях настоящей Лицензии, а Полный исходный текст
+Прокси-программ должен предоставляться пользователю либо на условиях настоящей
+Лицензии, либо на условиях одной из свободных лицензий, совместимых с
+Генеральной публичной Лицензией GNU, перечисленных Фондом Свободного
+Программного Обеспечения в списке под названием "Лицензии свободных программ,
+совместимые с GPL".
+
+  14. Пересмотренные редакции настоящей Лицензии.
+
+  Автор настоящей Лицензии время от времени может публиковать пересмотренные
+и (или) новые редакции Сетевой Публичной Лицензии Vitastor. Они будут аналогичны
+по смыслу настоящей редакции, но могут отличаться от нее в деталях, направленных
+на решение новых проблем или регулирование новых отношений.
+
+  Каждой редакции присваивается собственный номер. Если для Программы указано,
+что к ней применима определенная редакция Сетевой Публичной Лицензии Vitastor
+"или любая более поздняя редакция", у Вас есть возможность использовать термины
+и условия, содержащиеся в редакции с указанным номером или любой более поздней
+редакции, опубликованной автором настоящей Лицензии. Если для Программы не
+указан номер редакции Сетевой Публичной Лицензии Vitastor, Вы можете выбрать
+любую редакцию, опубликованную автором настоящей Лицензии.
+
+  Более поздние редакции Лицензии могут дать Вам дополнительные или принципиально
+иные права. Тем не менее в результате Вашего выбора более поздней редакции на
+автора или правообладателя не возлагается никаких дополнительных обязанностей.
+
+  15. Отказ от гарантий.
+
+  НА ПРОГРАММУ НЕ ПРЕДОСТАВЛЯЕТСЯ НИКАКИХ ГАРАНТИЙ ЗА ИСКЛЮЧЕНИЕМ ПРЕДУСМОТРЕННЫХ
+ДЕЙСТВУЮЩИМ ЗАКОНОДАТЕЛЬСТВОМ. ЕСЛИ ИНОЕ НЕ УКАЗАНО В ПИСЬМЕННОЙ ФОРМЕ,
+ПРАВООБЛАДАТЕЛИ И (ИЛИ) ТРЕТЬИ ЛИЦА ПРЕДОСТАВЛЯЮТ ПРОГРАММУ "КАК ЕСТЬ", БЕЗ
+КАКИХ-ЛИБО ЯВНЫХ ИЛИ ПОДРАЗУМЕВАЕМЫХ ГАРАНТИЙ, ВКЛЮЧАЯ ГАРАНТИИ ПРИГОДНОСТИ ДЛЯ
+КОНКРЕТНЫХ ЦЕЛЕЙ, НО НЕ ОГРАНИЧИВАЯСЬ ИМИ. ВЕСЬ РИСК, СВЯЗАННЫЙ С КАЧЕСТВОМ И
+ПРОИЗВОДИТЕЛЬНОСТЬЮ ПРОГРАММЫ, ВОЗЛАГАЕТСЯ НА ВАС. ЕСЛИ В ПРОГРАММЕ БУДУТ
+ВЫЯВЛЕНЫ НЕДОСТАТКИ, ВЫ ПРИНИМАЕТЕ НА СЕБЯ СТОИМОСТЬ ВСЕГО НЕОБХОДИМОГО
+ОБСЛУЖИВАНИЯ, РЕМОНТА ИЛИ ИСПРАВЛЕНИЯ.
+
+  16. Ограничение ответственности.
+
+  ЕСЛИ ИНОЕ НЕ ПРЕДУСМОТРЕНО ДЕЙСТВУЮЩИМ ЗАКОНОДАТЕЛЬСТВОМ ИЛИ СОГЛАШЕНИЕМ СТОРОН,
+ЗАКЛЮЧЕННЫМ В ПИСЬМЕННОЙ ФОРМЕ, ПРАВООБЛАДАТЕЛЬ ИЛИ ИНОЕ ЛИЦО, КОТОРОЕ ВНОСИТ
+ИЗМЕНЕНИЯ В ПРОГРАММУ И (ИЛИ) ПЕРЕДАЕТ ЕЁ НА УСЛОВИЯХ, СФОРМУЛИРОВАННЫХ ВЫШЕ, НЕ
+МОЖЕТ НЕСТИ ОТВЕТСТВЕННОСТЬ ПЕРЕД ВАМИ ЗА ПРИЧИНЕННЫЙ УЩЕРБ, ВКЛЮЧАЯ УЩЕРБ
+ОБЩЕГО ЛИБО КОНКРЕТНОГО ХАРАКТЕРА, ПРИЧИНЕННЫЙ СЛУЧАЙНО ИЛИ ЯВЛЯЮЩИЙСЯ
+СЛЕДСТВИЕМ ИСПОЛЬЗОВАНИЯ ПРОГРАММЫ ЛИБО НЕВОЗМОЖНОСТИ ЕЁ ИСПОЛЬЗОВАНИЯ (В ТОМ
+ЧИСЛЕ ЗА УНИЧТОЖЕНИЕ ИЛИ МОДИФИКАЦИЮ ИНФОРМАЦИИ, ЛИБО УБЫТКИ, ПОНЕСЕННЫЕ ВАМИ
+ИЛИ ТРЕТЬИМИ ЛИЦАМИ, ЛИБО СБОИ ПРОГРАММЫ ПРИ ВЗАИМОДЕЙСТВИИ С ДРУГИМ ПРОГРАММНЫМ
+ОБЕСПЕЧЕНИЕМ), В ТОМ ЧИСЛЕ И В СЛУЧАЯХ, КОГДА ПРАВООБЛАДАТЕЛЬ ИЛИ ТРЕТЬЕ ЛИЦО
+ПРЕДУПРЕЖДЕНЫ О ВОЗМОЖНОСТИ ПРИЧИНЕНИЯ ТАКИХ УБЫТКОВ.
+
+  17. Толкование разделов 15 и 16.
+
+  Если отказ от гарантии и ограничение ответственности, представленные выше, по
+закону не могут быть применены в соответствии с их условиями, суды,
+рассматривающие спор, должны применить действующий закон, который в наибольшей
+степени предусматривает абсолютный отказ от всей гражданской ответственности в
+связи с Программой, за исключением случаев, когда гарантия или принятие на себя
+ответственности за копию программы предоставляется за плату.
+
+                        КОНЕЦ ОПРЕДЕЛЕНИЙ И УСЛОВИЙ
+
+          Порядок применения условий Лицензии к Вашим программам
+
+  Если Вы разрабатываете новую программу и хотите, чтобы её использование принесло
+максимальную пользу обществу, наилучший способ достичь этого -- сделать её
+свободной, чтобы все могли распространять и изменять её на условиях настоящей
+Лицензии.
+
+  Для этого сделайте так, чтобы программа содержала в себе описанные ниже
+уведомления. Самым надежным способом это сделать является включение их в начало
+каждого файла исходного текста, чтобы наиболее эффективным образом сообщить об
+отсутствии гарантий; каждый файл должен иметь по меньшей мере одну строку с
+оповещением об авторских правах и указанием на то, где находится полный текст
+уведомлений.
+
+    <Строка с названием Программы и информацией о её назначении.>
+    Copyright © <год выпуска программы в свет>  <имя автора>
+
+    Эта программа является свободным программным обеспечением: Вы можете
+    распространять её и (или) изменять, соблюдая условия Сетевой Публичной
+    Лицензии Vitastor, опубликованной автором Vitastor, либо редакции 1.1
+    Лицензии, либо (на Ваше усмотрение) любой редакции, выпущенной позже.
+
+    Эта программа распространяется в расчете на то, что она окажется полезной,
+    но БЕЗ КАКИХ-ЛИБО ГАРАНТИЙ, включая подразумеваемую гарантию КАЧЕСТВА либо
+    ПРИГОДНОСТИ ДЛЯ ОПРЕДЕЛЕННЫХ ЦЕЛЕЙ. Ознакомьтесь с Сетевой Публичной
+    Лицензией Vitastor для получения более подробной информации.
+
+  Также добавьте информацию о том, как связаться с Вами посредством электронной
+или обычной почты.
+
+  Если ваша программа взаимодействует с пользователями удаленно через
+компьютерную сеть, Вы также должны убедиться, что обеспечили её пользователям
+возможность получить её исходные тексты. Например, если Ваша программа является
+веб-приложением, её интерфейс может отображать ссылку "Исходные коды", которая
+указывает на архив с текстом. Существует много способов, которыми Вы можете
+распространять исходные тексты, для разных программ подходят разные решения;
+ознакомьтесь с разделом 13 для того, чтобы узнать конкретные требования.
--- a/VNPL-1.1.txt
+++ b/VNPL-1.1.txt
@@ -61,7 +61,7 @@ modification follow.

  0. Definitions.

-  "This License" refers to version 1 of the Vitastor Network Public License.
+  "This License" refers to version 1.1 of the Vitastor Network Public License.

  "Copyright" also means copyright-like laws that apply to other kinds of
 works, such as semiconductor masks.
@@ -629,7 +629,7 @@ the "copyright" line and a pointer to where the full notice is found.

    This program is free software: you can redistribute it and/or modify
    it under the terms of the Vitastor Network Public License as published by
-    the Vitastor Author, either version 1 of the License, or
+    the Vitastor Author, either version 1.1 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
--- a/csi/Dockerfile
+++ b/csi/Dockerfile
@@ -18,15 +18,19 @@ ENV CSI_ENDPOINT=""

 RUN apt-get update && \
    apt-get install -y wget && \
-    wget -q -O /etc/apt/trusted.gpg.d/vitastor.gpg https://vitastor.io/debian/pubkey.gpg && \
-    (echo deb http://vitastor.io/debian buster main > /etc/apt/sources.list.d/vitastor.list) && \
    (echo deb http://deb.debian.org/debian buster-backports main > /etc/apt/sources.list.d/backports.list) && \
    (echo "APT::Install-Recommends false;" > /etc/apt/apt.conf) && \
    apt-get update && \
-    apt-get install -y e2fsprogs xfsprogs vitastor kmod && \
+    apt-get install -y e2fsprogs xfsprogs kmod && \
    apt-get clean && \
    (echo options nbd nbds_max=128 > /etc/modprobe.d/nbd.conf)

 COPY --from=build /app/vitastor-csi /bin/

+RUN (echo deb http://vitastor.io/debian buster main > /etc/apt/sources.list.d/vitastor.list) && \
+    wget -q -O /etc/apt/trusted.gpg.d/vitastor.gpg https://vitastor.io/debian/pubkey.gpg && \
+    apt-get update && \
+    apt-get install -y vitastor-client && \
+    apt-get clean
+
 ENTRYPOINT ["/bin/vitastor-csi"]
--- a/csi/Makefile
+++ b/csi/Makefile
@@ -1,4 +1,4 @@
-VERSION ?= v0.7.1
+VERSION ?= v0.8.1

 all: build push

--- a/csi/deploy/004-csi-nodeplugin.yaml
+++ b/csi/deploy/004-csi-nodeplugin.yaml
@@ -49,7 +49,7 @@ spec:
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
-          image: vitalif/vitastor-csi:v0.7.1
+          image: vitalif/vitastor-csi:v0.8.1
          args:
            - "--node=$(NODE_ID)"
            - "--endpoint=$(CSI_ENDPOINT)"
@@ -102,7 +102,7 @@ spec:
            - "--health-port=9898"
          env:
            - name: CSI_ENDPOINT
-              value: unix://csi/csi.sock
+              value: unix:///csi/csi.sock
          volumeMounts:
          - mountPath: /csi
            name: socket-dir
--- a/csi/deploy/007-csi-provisioner.yaml
+++ b/csi/deploy/007-csi-provisioner.yaml
@@ -116,7 +116,7 @@ spec:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
-          image: vitalif/vitastor-csi:v0.7.1
+          image: vitalif/vitastor-csi:v0.8.1
          args:
            - "--node=$(NODE_ID)"
            - "--endpoint=$(CSI_ENDPOINT)"
--- a/csi/src/config.go
+++ b/csi/src/config.go
@@ -5,7 +5,7 @@ package vitastor

 const (
    vitastorCSIDriverName    = "csi.vitastor.io"
-    vitastorCSIDriverVersion = "0.7.1"
+    vitastorCSIDriverVersion = "0.8.1"
 )

 // Config struct fills the parameters of request or user input
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,10 +1,10 @@
-vitastor (0.7.1-1) unstable; urgency=medium
+vitastor (0.8.1-1) unstable; urgency=medium

  * Bugfixes

 -- Vitaliy Filippov <vitalif@yourcmc.ru>  Fri, 03 Jun 2022 02:09:44 +0300

-vitastor (0.7.1-1) unstable; urgency=medium
+vitastor (0.8.1-1) unstable; urgency=medium

  * Implement NFS proxy
  * Add documentation
--- a/debian/control
+++ b/debian/control
@@ -18,7 +18,7 @@ Description: Vitastor, a fast software-defined clustered block storage

 Package: vitastor-osd
 Architecture: amd64
-Depends: ${shlibs:Depends}, ${misc:Depends}, vitastor-client (= ${binary:Version})
+Depends: ${shlibs:Depends}, ${misc:Depends}, vitastor-client (= ${binary:Version}), fdisk, util-linux, parted
 Description: Vitastor, a fast software-defined clustered block storage - object storage daemon
 Vitastor object storage daemon, i.e. server program that stores data.

--- a/debian/vitastor-client.install
+++ b/debian/vitastor-client.install
@@ -4,4 +4,3 @@ usr/bin/vitastor-rm
 usr/bin/vitastor-nbd
 usr/bin/vitastor-nfs
 usr/lib/*/libvitastor*.so*
-mon/make-osd.sh /usr/lib/vitastor
--- a/debian/vitastor-mon.install
+++ b/debian/vitastor-mon.install
@@ -1 +1,2 @@
 mon usr/lib/vitastor
+mon/vitastor-mon.service /lib/systemd/system
--- a/debian/vitastor-mon.postinst
+++ b/debian/vitastor-mon.postinst
@@ -0,0 +1,9 @@
+#!/bin/sh
+
+set -e
+
+if [ "$1" = "configure" ]; then
+	addgroup --system --quiet vitastor
+	adduser --system --quiet --ingroup vitastor --no-create-home --home /nonexistent vitastor
+	mkdir -p /etc/vitastor
+fi
--- a/debian/vitastor-osd.install
+++ b/debian/vitastor-osd.install
@@ -1,2 +1,6 @@
 usr/bin/vitastor-osd
+usr/bin/vitastor-disk
 usr/bin/vitastor-dump-journal
+mon/vitastor-osd@.service /lib/systemd/system
+mon/vitastor.target /lib/systemd/system
+mon/90-vitastor.rules /lib/udev/rules.d
--- a/debian/vitastor-osd.postinst
+++ b/debian/vitastor-osd.postinst
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+set -e
+
+if [ "$1" = "configure" ]; then
+	addgroup --system --quiet vitastor
+	adduser --system --quiet --ingroup vitastor --no-create-home --home /nonexistent vitastor
+	install -o vitastor -g vitastor -d /var/log/vitastor
+	mkdir -p /etc/vitastor
+fi
--- a/debian/vitastor.Dockerfile
+++ b/debian/vitastor.Dockerfile
@@ -34,8 +34,8 @@ RUN set -e -x; \
    mkdir -p /root/packages/vitastor-$REL; \
    rm -rf /root/packages/vitastor-$REL/*; \
    cd /root/packages/vitastor-$REL; \
-    cp -r /root/vitastor vitastor-0.7.1; \
-    cd vitastor-0.7.1; \
+    cp -r /root/vitastor vitastor-0.8.1; \
+    cd vitastor-0.8.1; \
    ln -s /root/fio-build/fio-*/ ./fio; \
    FIO=$(head -n1 fio/debian/changelog | perl -pe 's/^.*\((.*?)\).*$/$1/'); \
    ls /usr/include/linux/raw.h || cp ./debian/raw.h /usr/include/linux/raw.h; \
@@ -48,8 +48,8 @@ RUN set -e -x; \
    rm -rf a b; \
    echo "dep:fio=$FIO" > debian/fio_version; \
    cd /root/packages/vitastor-$REL; \
-    tar --sort=name --mtime='2020-01-01' --owner=0 --group=0 --exclude=debian -cJf vitastor_0.7.1.orig.tar.xz vitastor-0.7.1; \
-    cd vitastor-0.7.1; \
+    tar --sort=name --mtime='2020-01-01' --owner=0 --group=0 --exclude=debian -cJf vitastor_0.8.1.orig.tar.xz vitastor-0.8.1; \
+    cd vitastor-0.8.1; \
    V=$(head -n1 debian/changelog | perl -pe 's/^.*\((.*?)\).*$/$1/'); \
    DEBFULLNAME="Vitaliy Filippov <vitalif@yourcmc.ru>" dch -D $REL -v "$V""$REL" "Rebuild for $REL"; \
    DEB_BUILD_OPTIONS=nocheck dpkg-buildpackage --jobs=auto -sa; \
--- a/docs/config/layout-cluster.en.md
+++ b/docs/config/layout-cluster.en.md
@@ -9,34 +9,34 @@
 These parameters apply to clients and OSDs, are fixed at the moment of OSD drive
 initialization and can't be changed after it without losing data.

+OSDs with different values of these parameters (for example, SSD and SSD+HDD
+OSDs) can coexist in one Vitastor cluster within different pools. Each pool can
+only include OSDs with identical settings of these parameters.
+
+These parameters, when set to a non-default value, must also be specified in
+etcd for clients to be aware of their values, either in /vitastor/config/global
+or in pool configuration. Pool configuration overrides the global setting.
+If the value for a pool in etcd doesn't match on-disk OSD configuration, the
+OSD will refuse to start PGs of that pool.
+
 - [block_size](#block_size)
 - [bitmap_granularity](#bitmap_granularity)
 - [immediate_commit](#immediate_commit)
- [client_dirty_limit](#client_dirty_limit)

 ## block_size

 - Type: integer
 - Default: 131072

-Size of objects (data blocks) into which all physical and virtual drives are
-subdivided in Vitastor. One of current main settings in Vitastor, affects
-memory usage, write amplification and I/O load distribution effectiveness.
+Size of objects (data blocks) into which all physical and virtual drives
+(within a pool) are subdivided in Vitastor. One of current main settings
+in Vitastor, affects memory usage, write amplification and I/O load
+distribution effectiveness.

 Recommended default block size is 128 KB for SSD and 4 MB for HDD. In fact,
 it's possible to use 4 MB for SSD too - it will lower memory usage, but
 may increase average WA and reduce linear performance.

-OSDs with different block sizes (for example, SSD and SSD+HDD OSDs) can
-currently coexist in one etcd instance only within separate Vitastor
-clusters with different etcd_prefix'es.
-
-Also block size can't be changed after OSD initialization without losing
-data.
-
-You must always specify block_size in etcd in /vitastor/config/global if
-you change it so all clients can know about it.
-
 OSD memory usage is roughly (SIZE / BLOCK * 68 bytes) which is roughly
 544 MB per 1 TB of used disk space with the default 128 KB block size.

@@ -50,12 +50,7 @@ of disk_alignment. It's called bitmap granularity because Vitastor tracks
 an allocation bitmap for each object containing 2 bits per each
 (bitmap_granularity) bytes.

-This parameter can't be changed after OSD initialization without losing
-data. Also it's fixed for the whole Vitastor cluster i.e. two different
-values can't be used in a single Vitastor cluster.
-
-Clients MUST be aware of this parameter value, so put it into etcd key
-/vitastor/config/global if you change it for any reason.
+Can't be smaller than the OSD data device sector.

 ## immediate_commit

@@ -99,26 +94,12 @@ unsafe to change by hand). The same may apply to newer HDDs with internal
 SSD cache or "media-cache" - for example, a lot of Seagate EXOS drives have
 it (they have internal SSD cache even though it's not stated in datasheets).

-This parameter must be set both in etcd in /vitastor/config/global and in
-OSD command line or configuration. Setting it to "all" or "small" requires
-enabling disable_journal_fsync and disable_meta_fsync, setting it to "all"
-also requires enabling disable_data_fsync.
+Setting this parameter to "all" or "small" in OSD parameters requires enabling
+disable_journal_fsync and disable_meta_fsync, setting it to "all" also requires
+enabling disable_data_fsync.

 TLDR: For optimal performance, set immediate_commit to "all" if you only use
 SSDs with supercapacitor-based power loss protection (nonvolatile
 write-through cache) for both data and journals in the whole Vitastor
 cluster. Set it to "small" if you only use such SSDs for journals. Leave
 empty if your drives have write-back cache.
-
-## client_dirty_limit
-
- Type: integer
- Default: 33554432
-
-Without immediate_commit=all this parameter sets the limit of "dirty"
-(not committed by fsync) data allowed by the client before forcing an
-additional fsync and committing the data. Also note that the client always
-holds a copy of uncommitted data in memory so this setting also affects
-RAM usage of clients.
-
-This parameter doesn't affect OSDs themselves.
--- a/docs/config/layout-cluster.ru.md
+++ b/docs/config/layout-cluster.ru.md
@@ -9,10 +9,19 @@
 Данные параметры используются клиентами и OSD, задаются в момент инициализации
 диска OSD и не могут быть изменены после этого без потери данных.

+OSD с разными значениями данных параметров (например, SSD и гибридные SSD+HDD
+OSD) могут сосуществовать в одном кластере Vitastor в разных пулах. Один пул
+может включать только OSD с одинаковыми настройками этих параметров.
+
+Данные параметры, отличаясь от значения по умолчанию, должны также быть заданы
+в etcd, чтобы клиенты могли узнать их значение, либо в глобальной конфигурации
+/vitastor/config/global, либо в настройках пулов. Настройки пула переопределяют
+глобальное значение. Если значение в настройках пула не будет соответствовать
+конфигурации OSD, OSD откажется запускать PG данного пула.
+
 - [block_size](#block_size)
 - [bitmap_granularity](#bitmap_granularity)
 - [immediate_commit](#immediate_commit)
- [client_dirty_limit](#client_dirty_limit)

 ## block_size

@@ -20,24 +29,15 @@
 - Значение по умолчанию: 131072

 Размер объектов (блоков данных), на которые делятся физические и виртуальные
-диски в Vitastor. Одна из ключевых на данный момент настроек, влияет на
-потребление памяти, объём избыточной записи (write amplification) и
-эффективность распределения нагрузки по OSD.
+диски в Vitastor (в рамках каждого пула). Одна из ключевых на данный момент
+настроек, влияет на потребление памяти, объём избыточной записи (write
+amplification) и эффективность распределения нагрузки по OSD.

 Рекомендуемые по умолчанию размеры блока - 128 килобайт для SSD и 4
 мегабайта для HDD. В принципе, для SSD можно тоже использовать 4 мегабайта,
 это понизит использование памяти, но ухудшит распределение нагрузки и в
 среднем увеличит WA.

-OSD с разными размерами блока (например, SSD и SSD+HDD OSD) на данный
-момент могут сосуществовать в рамках одного etcd только в виде двух независимых
-кластеров Vitastor с разными etcd_prefix.
-
-Также размер блока нельзя менять после инициализации OSD без потери данных.
-
-Если вы меняете размер блока, обязательно прописывайте его в etcd в
-/vitastor/config/global, дабы все клиенты его знали.
-
 Потребление памяти OSD составляет примерно (РАЗМЕР / БЛОК * 68 байт),
 т.е. примерно 544 МБ памяти на 1 ТБ занятого места на диске при
 стандартном 128 КБ блоке.
@@ -52,13 +52,7 @@ OSD с разными размерами блока (например, SSD и SS
 потому, что Vitastor хранит битовую карту для каждого объекта, содержащую
 по 2 бита на каждые (bitmap_granularity) байт.

-Данный параметр нельзя менять после инициализации OSD без потери данных.
-Также он фиксирован для всего кластера Vitastor, т.е. разные значения
-не могут сосуществовать в одном кластере.
-
-Клиенты ДОЛЖНЫ знать правильное значение этого параметра, так что если вы
-его меняете, обязательно прописывайте изменённое значение в etcd в ключ
-/vitastor/config/global.
+Не может быть меньше размера сектора дисков данных OSD.

 ## immediate_commit

@@ -108,8 +102,7 @@ HDD-дисках с внутренним SSD или "медиа" кэшем - н
 многих дисках Seagate EXOS (у них есть внутренний SSD-кэш, хотя это и не
 указано в спецификациях).

-Данный параметр нужно указывать и в etcd в /vitastor/config/global, и в
-командной строке или конфигурации OSD. Значения "all" и "small" требуют
+Указание "all" или "small" в настройках / командной строке OSD требует
 включения disable_journal_fsync и disable_meta_fsync, значение "all" также
 требует включения disable_data_fsync.

@@ -119,16 +112,3 @@ immediate_commit в значение "all", если вы используете
 такие SSD для всех журналов, но не для данных - можете установить параметр
 в "small". Если и какие-то из дисков журналов имеют волатильный кэш записи -
 оставьте параметр пустым.
-
-## client_dirty_limit
-
- Тип: целое число
- Значение по умолчанию: 33554432
-
-При работе без immediate_commit=all - это лимит объёма "грязных" (не
-зафиксированных fsync-ом) данных, при достижении которого клиент будет
-принудительно вызывать fsync и фиксировать данные. Также стоит иметь в виду,
-что в этом случае до момента fsync клиент хранит копию незафиксированных
-данных в памяти, то есть, настройка влияет на потребление памяти клиентами.
-
-Параметр не влияет на сами OSD.
--- a/docs/config/network.en.md
+++ b/docs/config/network.en.md
@@ -29,6 +29,7 @@ between clients, OSDs and etcd.
 - [etcd_slow_timeout](#etcd_slow_timeout)
 - [etcd_keepalive_timeout](#etcd_keepalive_timeout)
 - [etcd_ws_keepalive_timeout](#etcd_ws_keepalive_timeout)
+- [client_dirty_limit](#client_dirty_limit)

 ## tcp_header_buffer_size

@@ -212,3 +213,16 @@ etcd_report_interval to guarantee that keepalive actually works.

 etcd websocket ping interval required to keep the connection alive and
 detect disconnections quickly.
+
+## client_dirty_limit
+
+- Type: integer
+- Default: 33554432
+
+Without immediate_commit=all this parameter sets the limit of "dirty"
+(not committed by fsync) data allowed by the client before forcing an
+additional fsync and committing the data. Also note that the client always
+holds a copy of uncommitted data in memory so this setting also affects
+RAM usage of clients.
+
+This parameter doesn't affect OSDs themselves.
--- a/docs/config/network.ru.md
+++ b/docs/config/network.ru.md
@@ -29,6 +29,7 @@
 - [etcd_slow_timeout](#etcd_slow_timeout)
 - [etcd_keepalive_timeout](#etcd_keepalive_timeout)
 - [etcd_ws_keepalive_timeout](#etcd_ws_keepalive_timeout)
+- [client_dirty_limit](#client_dirty_limit)

 ## tcp_header_buffer_size

@@ -222,3 +223,16 @@ etcd_report_interval, чтобы keepalive гарантированно рабо
 - Значение по умолчанию: 30

 Интервал проверки живости вебсокет-подключений к etcd.
+
+## client_dirty_limit
+
+- Тип: целое число
+- Значение по умолчанию: 33554432
+
+При работе без immediate_commit=all - это лимит объёма "грязных" (не
+зафиксированных fsync-ом) данных, при достижении которого клиент будет
+принудительно вызывать fsync и фиксировать данные. Также стоит иметь в виду,
+что в этом случае до момента fsync клиент хранит копию незафиксированных
+данных в памяти, то есть, настройка влияет на потребление памяти клиентами.
+
+Параметр не влияет на сами OSD.
--- a/docs/config/pool.en.md
+++ b/docs/config/pool.en.md
@@ -33,6 +33,9 @@ Parameters:
 - [pg_count](#pg_count)
 - [failure_domain](#failure_domain)
 - [max_osd_combinations](#max_osd_combinations)
+- [block_size](#block_size)
+- [bitmap_granularity](#bitmap_granularity)
+- [immediate_commit](#immediate_commit)
 - [pg_stripe_size](#pg_stripe_size)
 - [root_node](#root_node)
 - [osd_tags](#osd_tags)
@@ -79,7 +82,7 @@ Parent node reference is required for intermediate tree nodes.
 Separate OSD settings are set in etc keys `/vitastor/config/osd/<number>`
 in JSON format `{"<key>":<value>}`.

-As of now, there is only one setting:
+As of now, two settings are supported:

 ## reweight

@@ -93,6 +96,15 @@ This means an OSD configured with reweight lower than 1 receives less PGs than
 it normally would. An OSD with reweight = 0 won't store any data. You can set
 reweight to 0 to trigger rebalance and remove all data from an OSD.

+## tags
+
+- Type: string or array of strings
+
+Sets tag or multiple tags for this OSD. Tags can be used to group OSDs into
+subsets and then use a specific subset for pool instead of all OSDs.
+For example you can mark SSD OSDs with tag "ssd" and HDD OSDs with "hdd" and
+such tags will work as device classes.
+
 # Pool parameters

 ## name
@@ -186,6 +198,43 @@ number of combinations to generate when optimising PG placement.

 This parameter usually doesn't require to be changed.

+## block_size
+
+- Type: integer
+- Default: 131072
+
+Block size for this pool. The value from /vitastor/config/global is used when
+unspecified. If your cluster has OSDs with different block sizes then pool must
+be restricted by [osd_tags](#osd_tags) to only include OSDs with matching block
+size.
+
+Read more about this parameter in [Cluster-Wide Disk Layout Parameters](layout-cluster.en.md#block_size).
+
+## bitmap_granularity
+
+- Type: integer
+- Default: 4096
+
+"Sector" size of virtual disks in this pool. The value from
+/vitastor/config/global is used when unspecified. Similar to block_size, the
+pool must be restricted by [osd_tags](#osd_tags) to only include OSDs with
+matching bitmap_granularity.
+
+Read more about this parameter in [Cluster-Wide Disk Layout Parameters](layout-cluster.en.md#bitmap_granularity).
+
+## immediate_commit
+
+- Type: string, one of "all", "small" and "none"
+- Default: none
+
+Immediate commit setting for this pool. The value from /vitastor/config/global
+is used when unspecified. Similar to block_size, the pool must be restricted by
+[osd_tags](#osd_tags) to only include OSDs with compatible immediate_commit.
+Compatible means that a pool with non-immediate commit will work with OSDs with
+immediate commit enabled, but not vice versa.
+
+Read more about this parameter in [Cluster-Wide Disk Layout Parameters](layout-cluster.en.md#immediate_commit).
+
 ## pg_stripe_size

 - Type: integer
--- a/docs/config/pool.ru.md
+++ b/docs/config/pool.ru.md
@@ -32,6 +32,9 @@
 - [pg_count](#pg_count)
 - [failure_domain](#failure_domain)
 - [max_osd_combinations](#max_osd_combinations)
+- [block_size](#block_size)
+- [bitmap_granularity](#bitmap_granularity)
+- [immediate_commit](#immediate_commit)
 - [pg_stripe_size](#pg_stripe_size)
 - [root_node](#root_node)
 - [osd_tags](#osd_tags)
@@ -78,7 +81,10 @@
 Настройки отдельных OSD задаются в ключах etcd `/vitastor/config/osd/<number>`
 в JSON-формате `{"<key>":<value>}`.

-На данный момент поддерживается одна настройка:
+На данный момент поддерживаются две настройки:
+
+- [reweight](#reweight)
+- [tags](#tags)

 ## reweight

@@ -93,6 +99,15 @@
 хранении данных вообще. Вы можете установить reweight в 0, чтобы убрать
 все данные с OSD.

+## tags
+
+- Тип: строка или массив строк
+
+Задаёт тег или набор тегов для данного OSD. Теги можно использовать, чтобы
+делить OSD на множества и потом размещать пул только на части OSD, а не на
+всех. Можно, например, пометить SSD OSD тегом "ssd", а HDD тегом "hdd", в
+этом смысле теги работают аналогично классам устройств.
+
 # Параметры

 ## name
@@ -110,7 +125,7 @@

 Схема избыточности, используемая в данном пуле. "jerasure" - синоним для "ec",
 в обеих схемах используются коды Рида-Соломона-Вандермонда, реализованные на
-основе библиотек ISA-L или jerasure. Быстрая реализацяю на основе ISA-L
+основе библиотек ISA-L или jerasure. Быстрая реализация на основе ISA-L
 используется автоматически, когда доступна, в противном случае используется
 более медленная jerasure-версия.

@@ -185,13 +200,51 @@ PG в Vitastor эферемерны, то есть вы можете менят

 Обычно данный параметр не требует изменений.

+## block_size
+
+- Тип: целое число
+- По умолчанию: 131072
+
+Размер блока для данного пула. Если не задан, используется значение из
+/vitastor/config/global. Если в вашем кластере есть OSD с разными размерами
+блока, пул должен быть ограничен только OSD, блок которых равен блоку пула,
+с помощью [osd_tags](#osd_tags).
+
+О самом параметре читайте в разделе [Дисковые параметры уровня кластера](layout-cluster.ru.md#block_size).
+
+## bitmap_granularity
+
+- Тип: целое число
+- По умолчанию: 4096
+
+Размер "сектора" виртуальных дисков в данном пуле. Если не задан, используется
+значение из /vitastor/config/global. Аналогично block_size, пул должен быть
+ограничен OSD со значением bitmap_granularity, равным значению пула, с помощью
+[osd_tags](#osd_tags).
+
+О самом параметре читайте в разделе [Дисковые параметры уровня кластера](layout-cluster.ru.md#bitmap_granularity).
+
+## immediate_commit
+
+- Тип: строка "all", "small" или "none"
+- По умолчанию: none
+
+Настройка мгновенного коммита для данного пула. Если не задана, используется
+значение из /vitastor/config/global. Аналогично block_size, пул должен быть
+ограничен OSD со значением bitmap_granularity, совместимым со значением пула, с
+помощью [osd_tags](#osd_tags). Совместимость означает, что пул с отключенным
+мгновенным коммитом может работать на OSD с включённым мгновенным коммитом, но
+не наоборот.
+
+О самом параметре читайте в разделе [Дисковые параметры уровня кластера](layout-cluster.ru.md#immediate_commit).
+
 ## pg_stripe_size

 - Тип: целое число
 - По умолчанию: 0

 Данный параметр задаёт размер полосы "нарезки" образов на PG. Размер полосы не может
-быть меньше, чем [block_size](layout-cluster.ru.md#block_size), умноженный на
+быть меньше, чем [block_size](#block_size), умноженный на
 (pg_size - parity_chunks) для EC-пулов или 1 для реплицированных пулов. То же
 значение используется по умолчанию.

--- a/docs/config/src/layout-cluster.en.md
+++ b/docs/config/src/layout-cluster.en.md
@@ -2,3 +2,13 @@

 These parameters apply to clients and OSDs, are fixed at the moment of OSD drive
 initialization and can't be changed after it without losing data.
+
+OSDs with different values of these parameters (for example, SSD and SSD+HDD
+OSDs) can coexist in one Vitastor cluster within different pools. Each pool can
+only include OSDs with identical settings of these parameters.
+
+These parameters, when set to a non-default value, must also be specified in
+etcd for clients to be aware of their values, either in /vitastor/config/global
+or in pool configuration. Pool configuration overrides the global setting.
+If the value for a pool in etcd doesn't match on-disk OSD configuration, the
+OSD will refuse to start PGs of that pool.
--- a/docs/config/src/layout-cluster.ru.md
+++ b/docs/config/src/layout-cluster.ru.md
@@ -2,3 +2,13 @@

 Данные параметры используются клиентами и OSD, задаются в момент инициализации
 диска OSD и не могут быть изменены после этого без потери данных.
+
+OSD с разными значениями данных параметров (например, SSD и гибридные SSD+HDD
+OSD) могут сосуществовать в одном кластере Vitastor в разных пулах. Один пул
+может включать только OSD с одинаковыми настройками этих параметров.
+
+Данные параметры, отличаясь от значения по умолчанию, должны также быть заданы
+в etcd, чтобы клиенты могли узнать их значение, либо в глобальной конфигурации
+/vitastor/config/global, либо в настройках пулов. Настройки пула переопределяют
+глобальное значение. Если значение в настройках пула не будет соответствовать
+конфигурации OSD, OSD откажется запускать PG данного пула.
--- a/docs/config/src/layout-cluster.yml
+++ b/docs/config/src/layout-cluster.yml
@@ -2,46 +2,28 @@
  type: int
  default: 131072
  info: |
-    Size of objects (data blocks) into which all physical and virtual drives are
-    subdivided in Vitastor. One of current main settings in Vitastor, affects
-    memory usage, write amplification and I/O load distribution effectiveness.
+    Size of objects (data blocks) into which all physical and virtual drives
+    (within a pool) are subdivided in Vitastor. One of current main settings
+    in Vitastor, affects memory usage, write amplification and I/O load
+    distribution effectiveness.

    Recommended default block size is 128 KB for SSD and 4 MB for HDD. In fact,
    it's possible to use 4 MB for SSD too - it will lower memory usage, but
    may increase average WA and reduce linear performance.

-    OSDs with different block sizes (for example, SSD and SSD+HDD OSDs) can
-    currently coexist in one etcd instance only within separate Vitastor
-    clusters with different etcd_prefix'es.
-
-    Also block size can't be changed after OSD initialization without losing
-    data.
-
-    You must always specify block_size in etcd in /vitastor/config/global if
-    you change it so all clients can know about it.
-
    OSD memory usage is roughly (SIZE / BLOCK * 68 bytes) which is roughly
    544 MB per 1 TB of used disk space with the default 128 KB block size.
  info_ru: |
    Размер объектов (блоков данных), на которые делятся физические и виртуальные
-    диски в Vitastor. Одна из ключевых на данный момент настроек, влияет на
-    потребление памяти, объём избыточной записи (write amplification) и
-    эффективность распределения нагрузки по OSD.
+    диски в Vitastor (в рамках каждого пула). Одна из ключевых на данный момент
+    настроек, влияет на потребление памяти, объём избыточной записи (write
+    amplification) и эффективность распределения нагрузки по OSD.

    Рекомендуемые по умолчанию размеры блока - 128 килобайт для SSD и 4
    мегабайта для HDD. В принципе, для SSD можно тоже использовать 4 мегабайта,
    это понизит использование памяти, но ухудшит распределение нагрузки и в
    среднем увеличит WA.

-    OSD с разными размерами блока (например, SSD и SSD+HDD OSD) на данный
-    момент могут сосуществовать в рамках одного etcd только в виде двух независимых
-    кластеров Vitastor с разными etcd_prefix.
-
-    Также размер блока нельзя менять после инициализации OSD без потери данных.
-
-    Если вы меняете размер блока, обязательно прописывайте его в etcd в
-    /vitastor/config/global, дабы все клиенты его знали.
-
    Потребление памяти OSD составляет примерно (РАЗМЕР / БЛОК * 68 байт),
    т.е. примерно 544 МБ памяти на 1 ТБ занятого места на диске при
    стандартном 128 КБ блоке.
@@ -54,25 +36,14 @@
    an allocation bitmap for each object containing 2 bits per each
    (bitmap_granularity) bytes.

-    This parameter can't be changed after OSD initialization without losing
-    data. Also it's fixed for the whole Vitastor cluster i.e. two different
-    values can't be used in a single Vitastor cluster.
-
-    Clients MUST be aware of this parameter value, so put it into etcd key
-    /vitastor/config/global if you change it for any reason.
+    Can't be smaller than the OSD data device sector.
  info_ru: |
    Требуемое выравнивание записи на виртуальные диски (размер их "сектора").
    Должен быть кратен disk_alignment. Называется гранулярностью битовой карты
    потому, что Vitastor хранит битовую карту для каждого объекта, содержащую
    по 2 бита на каждые (bitmap_granularity) байт.

-    Данный параметр нельзя менять после инициализации OSD без потери данных.
-    Также он фиксирован для всего кластера Vitastor, т.е. разные значения
-    не могут сосуществовать в одном кластере.
-
-    Клиенты ДОЛЖНЫ знать правильное значение этого параметра, так что если вы
-    его меняете, обязательно прописывайте изменённое значение в etcd в ключ
-    /vitastor/config/global.
+    Не может быть меньше размера сектора дисков данных OSD.
 - name: immediate_commit
  type: string
  default: false
@@ -114,10 +85,9 @@
    SSD cache or "media-cache" - for example, a lot of Seagate EXOS drives have
    it (they have internal SSD cache even though it's not stated in datasheets).

-    This parameter must be set both in etcd in /vitastor/config/global and in
-    OSD command line or configuration. Setting it to "all" or "small" requires
-    enabling disable_journal_fsync and disable_meta_fsync, setting it to "all"
-    also requires enabling disable_data_fsync.
+    Setting this parameter to "all" or "small" in OSD parameters requires enabling
+    disable_journal_fsync and disable_meta_fsync, setting it to "all" also requires
+    enabling disable_data_fsync.

    TLDR: For optimal performance, set immediate_commit to "all" if you only use
    SSDs with supercapacitor-based power loss protection (nonvolatile
@@ -168,8 +138,7 @@
    многих дисках Seagate EXOS (у них есть внутренний SSD-кэш, хотя это и не
    указано в спецификациях).

-    Данный параметр нужно указывать и в etcd в /vitastor/config/global, и в
-    командной строке или конфигурации OSD. Значения "all" и "small" требуют
+    Указание "all" или "small" в настройках / командной строке OSD требует
    включения disable_journal_fsync и disable_meta_fsync, значение "all" также
    требует включения disable_data_fsync.

@@ -179,22 +148,3 @@
    такие SSD для всех журналов, но не для данных - можете установить параметр
    в "small". Если и какие-то из дисков журналов имеют волатильный кэш записи -
    оставьте параметр пустым.
- name: client_dirty_limit
-  type: int
-  default: 33554432
-  info: |
-    Without immediate_commit=all this parameter sets the limit of "dirty"
-    (not committed by fsync) data allowed by the client before forcing an
-    additional fsync and committing the data. Also note that the client always
-    holds a copy of uncommitted data in memory so this setting also affects
-    RAM usage of clients.
-
-    This parameter doesn't affect OSDs themselves.
-  info_ru: |
-    При работе без immediate_commit=all - это лимит объёма "грязных" (не
-    зафиксированных fsync-ом) данных, при достижении которого клиент будет
-    принудительно вызывать fsync и фиксировать данные. Также стоит иметь в виду,
-    что в этом случае до момента fsync клиент хранит копию незафиксированных
-    данных в памяти, то есть, настройка влияет на потребление памяти клиентами.
-
-    Параметр не влияет на сами OSD.
--- a/docs/config/src/network.yml
+++ b/docs/config/src/network.yml
@@ -223,3 +223,22 @@
    detect disconnections quickly.
  info_ru: |
    Интервал проверки живости вебсокет-подключений к etcd.
+- name: client_dirty_limit
+  type: int
+  default: 33554432
+  info: |
+    Without immediate_commit=all this parameter sets the limit of "dirty"
+    (not committed by fsync) data allowed by the client before forcing an
+    additional fsync and committing the data. Also note that the client always
+    holds a copy of uncommitted data in memory so this setting also affects
+    RAM usage of clients.
+
+    This parameter doesn't affect OSDs themselves.
+  info_ru: |
+    При работе без immediate_commit=all - это лимит объёма "грязных" (не
+    зафиксированных fsync-ом) данных, при достижении которого клиент будет
+    принудительно вызывать fsync и фиксировать данные. Также стоит иметь в виду,
+    что в этом случае до момента fsync клиент хранит копию незафиксированных
+    данных в памяти, то есть, настройка влияет на потребление памяти клиентами.
+
+    Параметр не влияет на сами OSD.
--- a/docs/intro/architecture.ru.md
+++ b/docs/intro/architecture.ru.md
@@ -127,7 +127,7 @@
  запросы записи клиенты копируют в памяти и при потере соединения и повторном соединении
  с OSD повторяют из памяти. Скопированные в память данные удаляются при успешном fsync,
  а чтобы хранение этих данных не приводило к чрезмерному потреблению памяти, клиенты
-  автоматически выполняют fsync каждые [client_dirty_limit](../config/layout-cluster.ru.md#client_dirty_limit)
+  автоматически выполняют fsync каждые [client_dirty_limit](../config/network.ru.md#client_dirty_limit)
  записанных байт.

 ## Схожесть с Ceph
--- a/docs/intro/author.ru.md
+++ b/docs/intro/author.ru.md
@@ -33,5 +33,5 @@ Vitastor Network Public License 1.1, основанная на GNU GPL 3.0 с д
 и также на условиях GNU GPL 2.0 или более поздней версии. Так сделано в целях
 совместимости с таким ПО, как QEMU и fio.

-Вы можете найти полный текст VNPL 1.1 в файле [VNPL-1.1.txt](../../VNPL-1.1.txt),
-а GPL 2.0 в файле [GPL-2.0.txt](../../GPL-2.0.txt).
+Вы можете найти полный текст VNPL 1.1 на английском языке в файле [VNPL-1.1.txt](../../VNPL-1.1.txt),
+VNPL 1.1 на русском языке в файле [VNPL-1.1-RU.txt](../../VNPL-1.1-RU.txt), а GPL 2.0 в файле [GPL-2.0.txt](../../GPL-2.0.txt).
--- a/docs/intro/features.en.md
+++ b/docs/intro/features.en.md
@@ -34,6 +34,7 @@

 - [Debian and CentOS packages](../installation/packages.en.md)
 - [Image management CLI (vitastor-cli)](../usage/cli.en.md)
+- [Disk management CLI (vitastor-disk)](docs/usage/disk.en.md)
 - Generic user-space client library
 - [Native QEMU driver](../usage/qemu.en.md)
 - [Loadable fio engine for benchmarks](../usage/fio.en.md)
@@ -47,7 +48,6 @@

 The following features are planned for the future:

- Better OSD creation and auto-start tools
 - Other administrative tools
 - Web GUI
 - OpenNebula plugin
--- a/docs/intro/features.ru.md
+++ b/docs/intro/features.ru.md
@@ -36,6 +36,7 @@

 - [Пакеты для Debian и CentOS](../installation/packages.ru.md)
 - [Консольный интерфейс управления образами (vitastor-cli)](../usage/cli.ru.md)
+- [Инструмент управления дисками (vitastor-disk)](docs/usage/disk.ru.md)
 - Общая пользовательская клиентская библиотека для работы с кластером
 - [Драйвер диска для QEMU](../usage/qemu.ru.md)
 - [Драйвер диска для утилиты тестирования производительности fio](../usage/fio.ru.md)
@@ -47,7 +48,6 @@

 ## Планы развития

- Более корректные скрипты разметки дисков и автоматического запуска OSD
 - Другие инструменты администрирования
 - Web-интерфейс
 - Плагин для OpenNebula
--- a/docs/intro/quickstart.en.md
+++ b/docs/intro/quickstart.en.md
@@ -26,9 +26,14 @@
 ## Configure monitors

 On the monitor hosts:
- Edit variables at the top of `/usr/lib/vitastor/mon/make-units.sh` to desired values.
- Create systemd units for the monitor and etcd: `/usr/lib/vitastor/mon/make-units.sh`
- Start etcd and monitors: `systemctl start etcd vitastor-mon`
+- Put identical etcd_address into `/etc/vitastor/vitastor.conf`. Example:
+  ```
+  {
+    "etcd_address": ["10.200.1.10:2379","10.200.1.11:2379","10.200.1.12:2379"]
+  }
+  ```
+- Create systemd units for etcd by running: `/usr/lib/vitastor/mon/make-etcd`
+- Start etcd and monitors: `systemctl enable --now etcd vitastor-mon`

 ## Configure OSDs

@@ -40,11 +45,9 @@ On the monitor hosts:
  }
  ```
 - Initialize OSDs:
-  - Simplest, SSD-only: `/usr/lib/vitastor/mon/make-osd.sh /dev/disk/by-partuuid/XXX [/dev/disk/by-partuuid/YYY ...]`
-    **Warning!** This very simple script by default makes units for server-grade SSDs with write-through cache!
-    If it's not your case, you MUST remove disable_data_fsync and immediate_commit from systemd units.
-  - Hybrid, HDD+SSD: `/usr/lib/vitastor/mon/make-osd-hybrid.js /dev/sda /dev/sdb ...` &mdash; pass all your
-    devices (HDD and SSD) to this script &mdash; it will partition disks and initialize journals on its own.
+  - SSD-only: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]`
+  - Hybrid, SSD+HDD: `vitastor-disk prepare --hybrid /dev/sdXXX [/dev/sdYYY ...]`.
+    Pass all your devices (HDD and SSD) to this script &mdash; it will partition disks and initialize journals on its own.
    This script skips HDDs which are already partitioned so if you want to use non-empty disks for
    Vitastor you should first wipe them with `wipefs -a`. SSDs with GPT partition table are not skipped,
    but some free unpartitioned space must be available because the script creates new partitions for journals.
--- a/docs/intro/quickstart.ru.md
+++ b/docs/intro/quickstart.ru.md
@@ -26,16 +26,14 @@
 ## Настройте мониторы

 На хостах, выделенных под мониторы:
- Пропишите нужные вам значения в файле `/usr/lib/vitastor/mon/make-units.sh`
- Создайте юниты systemd для etcd и мониторов: `/usr/lib/vitastor/mon/make-units.sh`
- Запустите etcd и мониторы: `systemctl start etcd vitastor-mon`
- Пропишите etcd_address и osd_network в `/etc/vitastor/vitastor.conf`. Например:
+- Пропишите одинаковые etcd_address в `/etc/vitastor/vitastor.conf`. Например:
  ```
  {
-    "etcd_address": ["10.200.1.10:2379","10.200.1.11:2379","10.200.1.12:2379"],
-    "osd_network": "10.200.1.0/24"
+    "etcd_address": ["10.200.1.10:2379","10.200.1.11:2379","10.200.1.12:2379"]
  }
  ```
+- Инициализируйте сервисы etcd, запустив `/usr/lib/vitastor/mon/make-etcd`
+- Запустите etcd и мониторы: `systemctl enable --now etcd vitastor-mon`

 ## Настройте OSD

@@ -47,12 +45,10 @@
  }
  ```
 - Инициализуйте OSD:
-  - SSD: `/usr/lib/vitastor/make-osd.sh /dev/disk/by-partuuid/XXX [/dev/disk/by-partuuid/YYY ...]`. \
-    **Внимание!** Скрипт по умолчанию рассчитан на то, что у вас диски с конденсаторами и отключённым
-    кэшем! Если это не так, из юнитов systemd нужно убрать строчки disable_data_fsync и immediate_commit!
-  - Гибридные, HDD+SSD: `/usr/lib/vitastor/mon/make-osd-hybrid.js /dev/sda /dev/sdb ...` - передайте
-    все ваши SSD и HDD скрипту в командной строке подряд, скрипт автоматически выделит разделы под
-    журналы на SSD и данные на HDD. Скрипт пропускает HDD, на которых уже есть разделы
+  - SSD: `vitastor-disk prepare /dev/sdXXX [/dev/sdYYY ...]`
+  - Гибридные, SSD+HDD: `vitastor-disk prepare --hybrid /dev/sdXXX [/dev/sdYYY ...]`.
+    Передайте все ваши SSD и HDD скрипту в командной строке подряд, скрипт автоматически выделит
+    разделы под журналы на SSD и данные на HDD. Скрипт пропускает HDD, на которых уже есть разделы
    или вообще какие-то данные, поэтому если диски непустые, сначала очистите их с помощью
    `wipefs -a`. SSD с таблицей разделов не пропускаются, но так как скрипт создаёт новые разделы
    для журналов, на SSD должно быть доступно свободное нераспределённое место.
--- a/docs/usage/cli.en.md
+++ b/docs/usage/cli.en.md
@@ -20,7 +20,6 @@ It supports the following commands:
 - [rm-data](#rm-data)
 - [merge-data](#merge-data)
 - [alloc-osd](#alloc-osd)
- [simple-offsets](#simple-offsets)

 Global options:

@@ -38,9 +37,9 @@ Global options:

 `vitastor-cli status`

-Показать состояние кластера.
+Show cluster status.

-Пример вывода:
+Example output:

 ```
  cluster:
@@ -65,9 +64,9 @@ Global options:

 `vitastor-cli df`

-Показать список пулов и занятое место.
+Show pool space statistics.

-Пример вывода:
+Example output:

 ```
 NAME      SCHEME  PGS  TOTAL    USED    AVAILABLE  USED%   EFFICIENCY
@@ -76,27 +75,26 @@ size1     1/1     32   199.9 G  10 G    121.5 G    39.23%  100%
 kaveri    2/1     32   0 B      10 G    0 B        100%    0%
 ```

-В примере у пула "kaveri" эффективность равна нулю, так как все OSD выключены.
+In the example above, "kaveri" pool has "zero" efficiency because all its OSD are down.

 ## ls

 `vitastor-cli ls [-l] [-p POOL] [--sort FIELD] [-r] [-n N] [<glob> ...]`

-Показать список образов, если переданы шаблоны `<glob>`, то только с именами,
-соответствующими этим шаблонам (стандартные ФС-шаблоны с * и ?).
+List images (only matching `<glob>` pattern(s) if passed).

-Опции:
+Options:

 ```
-p|--pool POOL  Фильтровать образы по пулу (ID или имени)
-l|--long       Также выводить статистику занятого места и ввода-вывода
--del           Также выводить статистику операций удаления
--sort FIELD    Сортировать по заданному полю (name, size, used_size, <read|write|delete>_<iops|bps|lat|queue>)
-r|--reverse    Сортировать в обратном порядке
-n|--count N    Показывать только первые N записей
+-p|--pool POOL  Filter images by pool ID or name
+-l|--long       Also report allocated size and I/O statistics
+--del           Also include delete operation statistics
+--sort FIELD    Sort by specified field (name, size, used_size, <read|write|delete>_<iops|bps|lat|queue>)
+-r|--reverse    Sort in descending order
+-n|--count N    Only list first N items
 ```

-Пример вывода:
+Example output:

 ```
 NAME                 POOL      SIZE  USED    READ   IOPS  QUEUE  LAT   WRITE  IOPS  QUEUE  LAT   FLAGS  PARENT
@@ -113,94 +111,67 @@ bench-kaveri         kaveri    10 G  10 G    0 B/s  0     0      0 us  0 B/s  0

 `vitastor-cli create -s|--size <size> [-p|--pool <id|name>] [--parent <parent_name>[@<snapshot>]] <name>`

-Создать образ. Для размера `<size>` можно использовать суффиксы K/M/G/T (килобайт-мегабайт-гигабайт-терабайт).
-Если указана опция `--parent`, создаётся клон образа. Родитель `<parent_name>[@<snapshot>]` должен быть
-снимком (или просто немодифицируемым образом). Пул обязательно указывать, если в кластере больше одного пула.
+Create an image. You may use K/M/G/T suffixes for `<size>`. If `--parent` is specified,
+a copy-on-write image clone is created. Parent must be a snapshot (readonly image).
+Pool must be specified if there is more than one pool.

 ```
 vitastor-cli create --snapshot <snapshot> [-p|--pool <id|name>] <image>
 vitastor-cli snap-create [-p|--pool <id|name>] <image>@<snapshot>
 ```

-Создать снимок образа `<name>` (можно использовать любую форму команды). Снимок можно создавать без остановки
-клиентов, если пишущий клиент максимум 1.
+Create a snapshot of image `<name>` (either form can be used). May be used live if only a single writer is active.

 ## modify

 `vitastor-cli modify <name> [--rename <new-name>] [--resize <size>] [--readonly | --readwrite] [-f|--force]`

-Изменить размер, имя образа или флаг "только для чтения". Снимать флаг "только для чтения"
-и уменьшать размер образов, у которых есть дочерние клоны, без `--force` нельзя.
-
-Если новый размер меньше старого, "лишние" данные будут удалены, поэтому перед уменьшением
-образа сначала уменьшите файловую систему в нём.
+Rename, resize image or change its readonly status. Images with children can't be made read-write.
+If the new size is smaller than the old size, extra data will be purged.
+You should resize file system in the image, if present, before shrinking it.

 ```
-f|--force  Разрешить уменьшение или перевод в чтение-запись образа, у которого есть клоны.
+-f|--force  Proceed with shrinking or setting readwrite flag even if the image has children.
 ```

 ## rm

 `vitastor-cli rm <from> [<to>] [--writers-stopped]`

-Удалить образ `<from>` или все слои от `<from>` до `<to>` (`<to>` должен быть дочерним
-образом `<from>`), одновременно меняя родительские образы их клонов (если таковые есть).
-
-`--writers-stopped` позволяет чуть более эффективно удалять образы в частом случае, когда
-у удаляемой цепочки есть только один дочерний образ, содержащий небольшой объём данных.
-В этом случае дочерний образ вливается в родительский и удаляется, а родительский
-переименовывается в дочерний.
-
-В других случаях родительские слои вливаются в дочерние.
+Remove `<from>` or all layers between `<from>` and `<to>` (`<to>` must be a child of `<from>`),
+rebasing all their children accordingly. --writers-stopped allows merging to be a bit
+more effective in case of a single 'slim' read-write child and 'fat' removed parent:
+the child is merged into parent and parent is renamed to child in that case.
+In other cases parent layers are always merged into children.

 ## flatten

 `vitastor-cli flatten <layer>`

-Сделай образ `<layer>` плоским, то есть, скопировать в него данные и разорвать его
-соединение с родительскими.
+Flatten a layer, i.e. merge data and detach it from parents.

 ## rm-data

 `vitastor-cli rm-data --pool <pool> --inode <inode> [--wait-list] [--min-offset <offset>]`

-Удалить данные инода, не меняя метаданные образов.
+Remove inode data without changing metadata.

 ```
--wait-list   Сначала запросить полный листинг объектов, а потом начать удалять.
-              Требует больше памяти, но позволяет правильно печатать прогресс удаления.
--min-offset  Удалять только данные, начиная с заданного смещения.
+--wait-list   Retrieve full objects listings before starting to remove objects.
+              Requires more memory, but allows to show correct removal progress.
+--min-offset  Purge only data starting with specified offset.
 ```

 ## merge-data

 `vitastor-cli merge-data <from> <to> [--target <target>]`

-Слить данные слоёв, не меняя метаданные. Вливает данные из слоёв от `<from>` до `<to>`
-в целевой образ `<target>`. `<to>` должен быть дочерним образом `<from>`, а `<target>`
-должен быть одним из слоёв между `<from>` и `<to>`, включая сами `<from>` и `<to>`.
+Merge layer data without changing metadata. Merge `<from>`..`<to>` to `<target>`.
+`<to>` must be a child of `<from>` and `<target>` may be one of the layers between
+`<from>` and `<to>`, including `<from>` and `<to>`.

 ## alloc-osd

 `vitastor-cli alloc-osd`

-Атомарно выделить новый номер OSD и зарезервировать его, создав в etcd пустой
-ключ `/osd/stats/<n>`.
-
-## simple-offsets
-
-`vitastor-cli simple-offsets <device>`
-
-Рассчитать смещения для простого и тупого создания OSD на диске (без суперблока).
-
-Опции (см. также [Дисковые параметры уровня кластера](../config/layout-cluster.ru.md)):
-
-```
--object_size 128k       Размер блока хранилища
--bitmap_granularity 4k  Гранулярность битовых карт
--journal_size 32M       Размер журнала
--device_block_size 4k   Размер блока устройства
--journal_offset 0       Смещение журнала
--device_size 0          Размер устройства
--format text            Формат результата: json, options, env или text
-```
+Allocate a new OSD number and reserve it by creating empty `/osd/stats/<n>` key.
--- a/docs/usage/cli.ru.md
+++ b/docs/usage/cli.ru.md
@@ -21,7 +21,6 @@ vitastor-cli - интерфейс командной строки для адм
 - [rm-data](#rm-data)
 - [merge-data](#merge-data)
 - [alloc-osd](#alloc-osd)
- [simple-offsets](#simple-offsets)

 Глобальные опции:

@@ -39,9 +38,9 @@ vitastor-cli - интерфейс командной строки для адм

 `vitastor-cli status`

-Show cluster status.
+Показать состояние кластера.

-Example output:
+Пример вывода:

 ```
  cluster:
@@ -66,9 +65,9 @@ Example output:

 `vitastor-cli df`

-Show pool space statistics.
+Показать список пулов и занятое место.

-Example output:
+Пример вывода:

 ```
 NAME      SCHEME  PGS  TOTAL    USED    AVAILABLE  USED%   EFFICIENCY
@@ -77,26 +76,27 @@ size1     1/1     32   199.9 G  10 G    121.5 G    39.23%  100%
 kaveri    2/1     32   0 B      10 G    0 B        100%    0%
 ```

-In the example above, "kaveri" pool has "zero" efficiency because all its OSD are down.
+В примере у пула "kaveri" эффективность равна нулю, так как все OSD выключены.

 ## ls

 `vitastor-cli ls [-l] [-p POOL] [--sort FIELD] [-r] [-n N] [<glob> ...]`

-List images (only matching `<glob>` pattern(s) if passed).
+Показать список образов, если переданы шаблоны `<glob>`, то только с именами,
+соответствующими этим шаблонам (стандартные ФС-шаблоны с * и ?).

-Options:
+Опции:

 ```
-p|--pool POOL  Filter images by pool ID or name
-l|--long       Also report allocated size and I/O statistics
--del           Also include delete operation statistics
--sort FIELD    Sort by specified field (name, size, used_size, <read|write|delete>_<iops|bps|lat|queue>)
-r|--reverse    Sort in descending order
-n|--count N    Only list first N items
+-p|--pool POOL  Фильтровать образы по пулу (ID или имени)
+-l|--long       Также выводить статистику занятого места и ввода-вывода
+--del           Также выводить статистику операций удаления
+--sort FIELD    Сортировать по заданному полю (name, size, used_size, <read|write|delete>_<iops|bps|lat|queue>)
+-r|--reverse    Сортировать в обратном порядке
+-n|--count N    Показывать только первые N записей
 ```

-Example output:
+Пример вывода:

 ```
 NAME                 POOL      SIZE  USED    READ   IOPS  QUEUE  LAT   WRITE  IOPS  QUEUE  LAT   FLAGS  PARENT
@@ -113,85 +113,76 @@ bench-kaveri         kaveri    10 G  10 G    0 B/s  0     0      0 us  0 B/s  0

 `vitastor-cli create -s|--size <size> [-p|--pool <id|name>] [--parent <parent_name>[@<snapshot>]] <name>`

-Create an image. You may use K/M/G/T suffixes for `<size>`. If `--parent` is specified,
-a copy-on-write image clone is created. Parent must be a snapshot (readonly image).
-Pool must be specified if there is more than one pool.
+Создать образ. Для размера `<size>` можно использовать суффиксы K/M/G/T (килобайт-мегабайт-гигабайт-терабайт).
+Если указана опция `--parent`, создаётся клон образа. Родитель `<parent_name>[@<snapshot>]` должен быть
+снимком (или просто немодифицируемым образом). Пул обязательно указывать, если в кластере больше одного пула.

 ```
 vitastor-cli create --snapshot <snapshot> [-p|--pool <id|name>] <image>
 vitastor-cli snap-create [-p|--pool <id|name>] <image>@<snapshot>
 ```

-Create a snapshot of image `<name>` (either form can be used). May be used live if only a single writer is active.
+Создать снимок образа `<name>` (можно использовать любую форму команды). Снимок можно создавать без остановки
+клиентов, если пишущий клиент максимум 1.

 ## modify

 `vitastor-cli modify <name> [--rename <new-name>] [--resize <size>] [--readonly | --readwrite] [-f|--force]`

-Rename, resize image or change its readonly status. Images with children can't be made read-write.
-If the new size is smaller than the old size, extra data will be purged.
-You should resize file system in the image, if present, before shrinking it.
+Изменить размер, имя образа или флаг "только для чтения". Снимать флаг "только для чтения"
+и уменьшать размер образов, у которых есть дочерние клоны, без `--force` нельзя.
+
+Если новый размер меньше старого, "лишние" данные будут удалены, поэтому перед уменьшением
+образа сначала уменьшите файловую систему в нём.

 ```
-f|--force  Proceed with shrinking or setting readwrite flag even if the image has children.
+-f|--force  Разрешить уменьшение или перевод в чтение-запись образа, у которого есть клоны.
 ```

 ## rm

 `vitastor-cli rm <from> [<to>] [--writers-stopped]`

-Remove `<from>` or all layers between `<from>` and `<to>` (`<to>` must be a child of `<from>`),
-rebasing all their children accordingly. --writers-stopped allows merging to be a bit
-more effective in case of a single 'slim' read-write child and 'fat' removed parent:
-the child is merged into parent and parent is renamed to child in that case.
-In other cases parent layers are always merged into children.
+Удалить образ `<from>` или все слои от `<from>` до `<to>` (`<to>` должен быть дочерним
+образом `<from>`), одновременно меняя родительские образы их клонов (если таковые есть).
+
+`--writers-stopped` позволяет чуть более эффективно удалять образы в частом случае, когда
+у удаляемой цепочки есть только один дочерний образ, содержащий небольшой объём данных.
+В этом случае дочерний образ вливается в родительский и удаляется, а родительский
+переименовывается в дочерний.
+
+В других случаях родительские слои вливаются в дочерние.

 ## flatten

 `vitastor-cli flatten <layer>`

-Flatten a layer, i.e. merge data and detach it from parents.
+Сделай образ `<layer>` плоским, то есть, скопировать в него данные и разорвать его
+соединение с родительскими.

 ## rm-data

 `vitastor-cli rm-data --pool <pool> --inode <inode> [--wait-list] [--min-offset <offset>]`

-Remove inode data without changing metadata.
+Удалить данные инода, не меняя метаданные образов.

 ```
--wait-list   Retrieve full objects listings before starting to remove objects.
-              Requires more memory, but allows to show correct removal progress.
--min-offset  Purge only data starting with specified offset.
+--wait-list   Сначала запросить полный листинг объектов, а потом начать удалять.
+              Требует больше памяти, но позволяет правильно печатать прогресс удаления.
+--min-offset  Удалять только данные, начиная с заданного смещения.
 ```

 ## merge-data

 `vitastor-cli merge-data <from> <to> [--target <target>]`

-Merge layer data without changing metadata. Merge `<from>`..`<to>` to `<target>`.
-`<to>` must be a child of `<from>` and `<target>` may be one of the layers between
-`<from>` and `<to>`, including `<from>` and `<to>`.
+Слить данные слоёв, не меняя метаданные. Вливает данные из слоёв от `<from>` до `<to>`
+в целевой образ `<target>`. `<to>` должен быть дочерним образом `<from>`, а `<target>`
+должен быть одним из слоёв между `<from>` и `<to>`, включая сами `<from>` и `<to>`.

 ## alloc-osd

 `vitastor-cli alloc-osd`

-Allocate a new OSD number and reserve it by creating empty `/osd/stats/<n>` key.
-
-## simple-offsets
-
-`vitastor-cli simple-offsets <device>`
-
-Calculate offsets for simple&stupid (no superblock) OSD deployment.
-
-Options (see also [Cluster-Wide Disk Layout Parameters](../config/layout-cluster.en.md)):
-
-```
--object_size 128k       Set blockstore block size
--bitmap_granularity 4k  Set bitmap granularity
--journal_size 32M       Set journal size
--device_block_size 4k   Set device block size
--journal_offset 0       Set journal offset
--device_size 0          Set device size
--format text            Result format: json, options, env, or text
-```
+Атомарно выделить новый номер OSD и зарезервировать его, создав в etcd пустой
+ключ `/osd/stats/<n>`.
--- a/docs/usage/disk.en.md
+++ b/docs/usage/disk.en.md
@@ -0,0 +1,245 @@
+[Documentation](../../README.md#documentation) → Usage → Disk Tool
+
+-----
+
+[Читать на русском](disk.ru.md)
+
+# Disk management tool
+
+vitastor-disk is a command-line tool for physical Vitastor disk management.
+
+It supports the following commands:
+
+- [prepare](#prepare)
+- [upgrade-simple](#upgrade-simple)
+- [resize](#resize)
+- [start/stop/restart/enable/disable](#start/stop/restart/enable/disable)
+- [read-sb](#read-sb)
+- [write-sb](#write-sb)
+- [udev](#udev)
+- [exec-osd](#exec-osd)
+- [pre-exec](#pre-exec)
+- Debugging:
+  - [dump-journal](#dump-journal)
+  - [write-journal](#write-journal)
+  - [dump-meta](#dump-meta)
+  - [write-meta](#write-meta)
+- [simple-offsets](#simple-offsets)
+
+## prepare
+
+`vitastor-disk prepare [OPTIONS] [devices...]`
+
+Initialize disk(s) for Vitastor OSD(s).
+
+There are two modes of this command. In the first mode, you pass `<devices>` which
+must be raw disks (not partitions). They are partitioned automatically and OSDs
+are initialized on all of them.
+
+In the second mode, you omit `<devices>` and pass `--data_device`, `--journal_device`
+and/or `--meta_device` which must be already existing partitions identified by their
+GPT partition UUIDs. In this case a single OSD is created.
+
+Requires `vitastor-cli`, `wipefs`, `sfdisk` and `partprobe` (from parted) utilities.
+
+Options (automatic mode):
+
+```
+--osd_per_disk <N>
+  Create <N> OSDs on each disk (default 1)
+--hybrid
+  Prepare hybrid (HDD+SSD) OSDs using provided devices. SSDs will be used for
+  journals and metadata, HDDs will be used for data. Partitions for journals and
+  metadata will be created automatically. Whether disks are SSD or HDD is decided
+  by the `/sys/block/.../queue/rotational` flag. In hybrid mode, default object
+  size is 1 MB instead of 128 KB, default journal size is 1 GB instead of 32 MB,
+  and throttle_small_writes is enabled by default.
+--disable_data_fsync auto
+  Disable data device cache and fsync (1/yes/true = on, default auto)
+--disable_meta_fsync auto
+  Disable metadata/journal device cache and fsync (default auto)
+--meta_reserve 2x,1G
+  New metadata partitions in --hybrid mode are created larger than actual
+  metadata size to ease possible future extension. The default is to allocate
+  2 times more space and at least 1G. Use this option to override.
+--max_other 10%
+  Use disks for OSD data even if they already have non-Vitastor partitions,
+  but only if these take up no more than this percent of disk space.
+```
+
+Options (single-device mode):
+
+```
+--data_device <DEV>        Use partition <DEV> for data
+--meta_device <DEV>        Use partition <DEV> for metadata (optional)
+--journal_device <DEV>     Use partition <DEV> for journal (optional)
+--disable_data_fsync 0     Disable data device cache and fsync (default off)
+--disable_meta_fsync 0     Disable metadata device cache and fsync (default off)
+--disable_journal_fsync 0  Disable journal device cache and fsync (default off)
+--force                    Bypass partition safety checks (for emptiness and so on)
+```
+
+Options (both modes):
+
+```
+--journal_size 1G/32M      Set journal size (area or partition size)
+--block_size 1M/128k       Set blockstore object size
+--bitmap_granularity 4k    Set bitmap granularity
+--data_device_block 4k     Override data device block size
+--meta_device_block 4k     Override metadata device block size
+--journal_device_block 4k  Override journal device block size
+```
+
+[immediate_commit](../config/layout-cluster.en.md#immediate_commit) setting is
+automatically derived from "disable fsync" options. It's set to "all" when fsync
+is disabled on all devices, and to "small" if fsync is only disabled on journal device.
+
+When data/meta/journal fsyncs are disabled, the OSD startup script automatically
+checks the device cache status on start and tries to disable cache for SATA/SAS disks.
+If it doesn't succeed it issues a warning in the system log.
+
+You can also pass other OSD options here as arguments and they'll be persisted
+to the superblock: max_write_iodepth, max_write_iodepth, min_flusher_count,
+max_flusher_count, inmemory_metadata, inmemory_journal, journal_sector_buffer_count,
+journal_no_same_sector_overwrites, throttle_small_writes, throttle_target_iops,
+throttle_target_mbs, throttle_target_parallelism, throttle_threshold_us.
+See [Runtime OSD Parameters](../config/osd.en.md) for details.
+
+## upgrade-simple
+
+`vitastor-disk upgrade-simple <UNIT_FILE|OSD_NUMBER>`
+
+Upgrade an OSD created by old (0.7.1 and older) `make-osd.sh` or `make-osd-hybrid.js` scripts.
+
+Adds superblocks to OSD devices, disables old `vitastor-osdN` unit and replaces it with `vitastor-osd@N`.
+Can be invoked with an osd number of with a path to systemd service file `UNIT_FILE` which
+must be `/etc/systemd/system/vitastor-osd<OSD_NUMBER>.service`.
+
+Note that the procedure isn't atomic and may ruin OSD data in case of an interrupt,
+so don't upgrade all your OSDs in parallel.
+
+Requires the `sfdisk` utility.
+
+## resize
+
+`vitastor-disk resize <ALL_OSD_PARAMETERS> <NEW_LAYOUT> [--iodepth 32]`
+
+Resize data area and/or rewrite/move journal and metadata.
+
+`ALL_OSD_PARAMETERS` must include all (at least all disk-related)
+parameters from OSD command line (i.e. from systemd unit or superblock).
+
+`NEW_LAYOUT` may include new disk layout parameters:
+
+```
+--new_data_offset SIZE     resize data area so it starts at SIZE
+--new_data_len SIZE        resize data area to SIZE bytes
+--new_meta_device PATH     use PATH for new metadata
+--new_meta_offset SIZE     make new metadata area start at SIZE
+--new_meta_len SIZE        make new metadata area SIZE bytes long
+--new_journal_device PATH  use PATH for new journal
+--new_journal_offset SIZE  make new journal area start at SIZE
+--new_journal_len SIZE     make new journal area SIZE bytes long
+```
+
+SIZE may include k/m/g/t suffixes. If any of the new layout parameter
+options are not specified, old values will be used.
+
+## start/stop/restart/enable/disable
+
+`vitastor-disk start|stop|restart|enable|disable [--now] <device> [device2 device3 ...]`
+
+Manipulate Vitastor OSDs using systemd by their device paths.
+
+Commands are passed to `systemctl` with `vitastor-osd@<num>` units as arguments.
+
+When `--now` is added to enable/disable, OSDs are also immediately started/stopped.
+
+## read-sb
+
+`vitastor-disk read-sb <device>`
+
+Try to read Vitastor OSD superblock from `<device>` and print it in JSON format.
+
+## write-sb
+
+`vitastor-disk write-sb <device>`
+
+Read JSON from STDIN and write it into Vitastor OSD superblock on `<device>`.
+
+## udev
+
+`vitastor-disk udev <device>`
+
+Try to read Vitastor OSD superblock from `<device>` and print variables for udev.
+
+## exec-osd
+
+`vitastor-disk exec-osd <device>`
+
+Read Vitastor OSD superblock from `<device>` and start the OSD with parameters from it.
+
+Intended for use from startup scripts (i.e. from systemd units).
+
+## pre-exec
+
+`vitastor-disk pre-exec <device>`
+
+Read Vitastor OSD superblock from `<device>` and perform pre-start checks for the OSD.
+
+For now, this only checks that device cache is in write-through mode if fsync is disabled.
+
+Intended for use from startup scripts (i.e. from systemd units).
+
+## dump-journal
+
+`vitastor-disk dump-journal [OPTIONS] <journal_file> <journal_block_size> <offset> <size>`
+
+Dump journal in human-readable or JSON (if `--json` is specified) format.
+
+Options:
+
+```
+--all             Scan the whole journal area for entries and dump them, even outdated ones
+--json            Dump journal in JSON format
+--format entries  (Default) Dump actual journal entries as an array, without data
+--format data     Same as "entries", but also include small write data
+--format blocks   Dump as an array of journal blocks each containing array of entries
+```
+
+## write-journal
+
+`vitastor-disk write-journal <journal_file> <journal_block_size> <bitmap_size> <offset> <size>`
+
+Write journal from JSON taken from standard input in the same format as produced by
+`dump-journal --json --format data`.
+
+## dump-meta
+
+`vitastor-disk dump-meta <meta_file> <meta_block_size> <offset> <size>`
+
+Dump metadata in JSON format.
+
+## write-meta
+
+`vitastor-disk write-meta <meta_file> <offset> <size>`
+
+Write metadata from JSON taken from standard input in the same format as produced by `dump-meta`.
+
+## simple-offsets
+
+`vitastor-disk simple-offsets <device>`
+
+Calculate offsets for old simple&stupid (no superblock) OSD deployment.
+
+Options (see also [Cluster-Wide Disk Layout Parameters](../config/layout-cluster.en.md)):
+
+```
+--object_size 128k       Set blockstore block size
+--bitmap_granularity 4k  Set bitmap granularity
+--journal_size 32M       Set journal size
+--device_block_size 4k   Set device block size
+--journal_offset 0       Set journal offset
+--device_size 0          Set device size
+--format text            Result format: json, options, env, or text
+```
--- a/docs/usage/disk.ru.md
+++ b/docs/usage/disk.ru.md
@@ -0,0 +1,248 @@
+[Документация](../../README-ru.md#документация) → Использование → Управление дисками
+
+-----
+
+[Read in English](disk.en.md)
+
+# Инструмент управления дисками
+
+vitastor-disk - инструмент командной строки для управления дисками Vitastor OSD.
+
+Поддерживаются следующие команды:
+
+- [prepare](#prepare)
+- [upgrade-simple](#upgrade-simple)
+- [resize](#resize)
+- [start/stop/restart/enable/disable](#start/stop/restart/enable/disable)
+- [read-sb](#read-sb)
+- [write-sb](#write-sb)
+- [udev](#udev)
+- [exec-osd](#exec-osd)
+- [pre-exec](#pre-exec)
+- Для отладки:
+  - [dump-journal](#dump-journal)
+  - [write-journal](#write-journal)
+  - [dump-meta](#dump-meta)
+  - [write-meta](#write-meta)
+- [simple-offsets](#simple-offsets)
+
+## prepare
+
+`vitastor-disk prepare [OPTIONS] [devices...]`
+
+Подготовить диск(и) для OSD Vitastor.
+
+У команды есть 2 режима. В первом режиме вы указываете список устройств `<devices>`,
+которые должны быть целыми дисками (не разделами). На них автоматически создаются
+разделы и инициализируются OSD.
+
+Во втором режиме вместо списка устройств вы указываете пути к отдельным устройствам
+`--data_device`, `--journal_device` и/или `--meta_device`, которые должны быть
+уже существующими GPT-разделами. В этом случае инициализируется ровно один OSD.
+
+Команде требуются утилиты `vitastor-cli`, `wipefs`, `sfdisk` и `partprobe` (из состава parted).
+
+Опции для автоматического режима:
+
+```
+--osd_per_disk <N>
+  Создавать по несколько (<N>) OSD на каждом диске (по умолчанию 1)
+--hybrid
+  Инициализировать гибридные (HDD+SSD) OSD на указанных дисках. SSD будут
+  использованы для журналов и метаданных, а HDD - для данных. Разделы для журналов
+  и метаданных будут созданы автоматически. Является ли диск SSD или HDD, определяется
+  по флагу `/sys/block/.../queue/rotational`. В гибридном режиме по умолчанию
+  используется размер объекта 1 МБ вместо 128 КБ, размер журнала 1 ГБ вместо 32 МБ
+  и включённый throttle_small_writes.
+--disable_data_fsync auto
+  Отключать кэш и fsync-и для устройств данных. (1/yes/true = да, по умолчанию автоопределение)
+--disable_meta_fsync auto
+  Отключать кэш и fsync-и для журналов и метаданных (по умолчанию автоопределение)
+--meta_reserve 2x,1G
+  В гибридном режиме для метаданных выделяется больше места, чем нужно на самом
+  деле, чтобы оставить запас под будущее расширение. По умолчанию выделяется
+  в 2 раза больше места, и не менее 1 ГБ. Чтобы изменить это поведение,
+  воспользуйтесь данной опцией.
+--max_other 10%
+  Использовать диски под данные OSD, даже если на них уже есть не-Vitastor-овые
+  разделы, но только в случае, если они занимают не более данного процента диска.
+```
+
+Опции для режима одного OSD:
+
+```
+--data_device <DEV>        Использовать раздел <DEV> для данных
+--meta_device <DEV>        Использовать раздел <DEV> для метаданных (опционально)
+--journal_device <DEV>     Использовать раздел <DEV> для журнала (опционально)
+--disable_data_fsync 0     Отключить кэш и fsync устройства данных (по умолчанию нет)
+--disable_meta_fsync 0     Отключить кэш и fsync метаданных (по умолчанию нет)
+--disable_journal_fsync 0  Отключить кэш и fsync журнала (по умолчанию нет)
+--force                    Пропустить проверки разделов (на пустоту и т.п.)
+```
+
+Опции для обоих режимов:
+
+```
+--journal_size 1G/32M      Задать размер журнала (области или раздела журнала)
+--block_size 1M/128k       Задать размер объекта хранилища
+--bitmap_granularity 4k    Задать гранулярность битовых карт
+--data_device_block 4k     Задать размер блока устройства данных
+--meta_device_block 4k     Задать размер блока метаданных
+--journal_device_block 4k  Задать размер блока журнала
+```
+
+Настройка [immediate_commit](../config/layout-cluster.ru.md#immediate_commit)
+автоматически выводится из опций отключения кэша - она устанавливается в "all", если кэш
+отключён на всех устройствах, и в "small", если он отключён только на устройстве журнала.
+
+Когда fsync данных/метаданных/журнала отключён, скрипты запуска OSD автоматически
+проверяют состояние кэша диска и стараются его отключить для SATA/SAS дисков. Если
+это не удаётся, в системный журнал выводится предупреждение.
+
+Вы можете передать данной команде и некоторые другие опции OSD в качестве аргументов
+и они тоже будут сохранены в суперблок: max_write_iodepth, max_write_iodepth, min_flusher_count,
+max_flusher_count, inmemory_metadata, inmemory_journal, journal_sector_buffer_count,
+journal_no_same_sector_overwrites, throttle_small_writes, throttle_target_iops,
+throttle_target_mbs, throttle_target_parallelism, throttle_threshold_us.
+Читайте об этих параметрах подробнее в разделе [Изменяемые параметры OSD](../config/osd.ru.md).
+
+## upgrade-simple
+
+`vitastor-disk upgrade-simple <UNIT_FILE|OSD_NUMBER>`
+
+Обновить OSD, созданный старыми (0.7.1 и старее) скриптами `make-osd.sh` и `make-osd-hybrid.js`.
+
+Добавляет суперблок на разделы OSD, отключает старый сервис `vitastor-osdN` и заменяет его на `vitastor-osd@N`.
+
+Можно вызывать, указывая либо номер OSD, либо путь к файлу сервиса `UNIT_FILE`, но он обязан
+иметь вид `/etc/systemd/system/vitastor-osd<OSD_NUMBER>.service`.
+
+Имейте в виду, что процедура обновления не атомарна и при прерывании может уничтожить данные OSD,
+так что обновляйте ваши OSD по очереди.
+
+Команде требуется утилита `sfdisk`.
+
+## resize
+
+`vitastor-disk resize <ALL_OSD_PARAMETERS> <NEW_LAYOUT> [--iodepth 32]`
+
+Изменить размер области данных и/или переместить журнал и метаданные.
+
+В `ALL_OSD_PARAMETERS` нужно указать все относящиеся к диску параметры OSD
+из суперблока OSD или из файла сервиса systemd (в старых версиях).
+
+В `NEW_LAYOUT` нужно указать новые параметры расположения данных:
+
+```
+--new_data_offset РАЗМЕР     сдвинуть начало области данных на РАЗМЕР байт
+--new_data_len РАЗМЕР        изменить размер области данных до РАЗМЕР байт
+--new_meta_device ПУТЬ       использовать ПУТЬ как новое устройство метаданных
+--new_meta_offset РАЗМЕР     разместить новые метаданные по смещению РАЗМЕР байт
+--new_meta_len РАЗМЕР        сделать новые метаданные размером РАЗМЕР байт
+--new_journal_device ПУТЬ    использовать ПУТЬ как новое устройство журнала
+--new_journal_offset РАЗМЕР  разместить новый журнал по смещению РАЗМЕР байт
+--new_journal_len РАЗМЕР     сделать новый журнал размером РАЗМЕР байт
+```
+
+РАЗМЕР может быть указан с суффиксами k/m/g/t. Если любой из новых параметров
+расположения не указан, он принимается равным старому значению.
+
+## start/stop/restart/enable/disable
+
+`vitastor-disk start|stop|restart|enable|disable [--now] <device> [device2 device3 ...]`
+
+Команды управления OSD по путям дисков через systemd.
+
+Команды транслируются `systemctl` с сервисами `vitastor-osd@<num>` в виде аргументов.
+
+Когда к командам включения/выключения добавляется параметр `--now`, OSD также сразу
+запускаются/останавливаются.
+
+## read-sb
+
+`vitastor-disk read-sb <device>`
+
+Прочитать суперблок OSD с диска `<device>` и вывести его в формате JSON.
+
+## write-sb
+
+`vitastor-disk write-sb <device>`
+
+Прочитать JSON со стандартного ввода и записать его в суперблок OSD на диск `<device>`.
+
+## udev
+
+`vitastor-disk udev <device>`
+
+Прочитать суперблок OSD с диска `<device>` и вывести переменные для udev.
+
+## exec-osd
+
+`vitastor-disk exec-osd <device>`
+
+Прочитать суперблок OSD с диска `<device>` и запустить исполняемый файл OSD с параметрами оттуда.
+
+Команда предназначена для использования из скриптов запуска (например, из сервисов systemd).
+
+## pre-exec
+
+`vitastor-disk pre-exec <device>`
+
+Прочитать суперблок OSD с диска `<device>` и провести проверки OSD перед запуском.
+
+На данный момент только отключает кэш диска или проверяет, что он отключён, если в параметрах
+OSD отключены fsync-и.
+
+Команда предназначена для использования из скриптов запуска (например, из сервисов systemd).
+
+## dump-journal
+
+`vitastor-disk dump-journal [OPTIONS] <journal_file> <journal_block_size> <offset> <size>`
+
+Вывести журнал в человекочитаемом или в JSON (с опцией `--json`) виде.
+
+Опции:
+
+```
+--all             Просканировать всю область журнала и вывести даже старые записи
+--json            Вывести журнал в формате JSON
+--format entries  (По умолчанию) Вывести только актуальные записи журнала без данных
+--format data     Вывести только актуальные записи журнала с данными
+--format blocks   Вывести массив блоков журнала, а в каждом массив актуальных записей без данных
+```
+
+## write-journal
+
+`vitastor-disk write-journal <journal_file> <journal_block_size> <bitmap_size> <offset> <size>`
+
+Записать журнал из JSON со стандартного ввода в формате, аналогичном `dump-journal --json --format data`.
+
+## dump-meta
+
+`vitastor-disk dump-meta <meta_file> <meta_block_size> <offset> <size>`
+
+Вывести метаданные в формате JSON.
+
+## write-meta
+
+`vitastor-disk write-meta <meta_file> <offset> <size>`
+
+Записать метаданные из JSON со стандартного ввода в формате, аналогичном `dump-meta`.
+
+## simple-offsets
+
+`vitastor-disk simple-offsets <device>`
+
+Рассчитать смещения для старого ("простого и тупого") создания OSD на диске (без суперблока).
+
+Опции (см. также [Дисковые параметры уровня кластера](../config/layout-cluster.ru.md)):
+
+```
+--object_size 128k       Размер блока хранилища
+--bitmap_granularity 4k  Гранулярность битовых карт
+--journal_size 32M       Размер журнала
+--device_block_size 4k   Размер блока устройства
+--journal_offset 0       Смещение журнала
+--device_size 0          Размер устройства
+--format text            Формат результата: json, options, env или text
+```
--- a/mon/90-vitastor.rules
+++ b/mon/90-vitastor.rules
@@ -0,0 +1,7 @@
+SUBSYSTEM=="block", ENV{ID_PART_ENTRY_TYPE}=="e7009fac-a5a1-4d72-af72-53de13059903", \
+    OWNER="vitastor", GROUP="vitastor", \
+    IMPORT{program}="/usr/bin/vitastor-disk udev $devnode", \
+    SYMLINK+="vitastor/$env{VITASTOR_ALIAS}"
+
+ENV{VITASTOR_OSD_NUM}!="", ACTION=="add", RUN{program}+="/usr/bin/systemctl enable --now vitastor-osd@$env{VITASTOR_OSD_NUM}"
+ENV{VITASTOR_OSD_NUM}!="", ACTION=="remove", RUN{program}+="/usr/bin/systemctl disable --now vitastor-osd@$env{VITASTOR_OSD_NUM}"
--- a/mon/make-etcd
+++ b/mon/make-etcd
@@ -0,0 +1,110 @@
+#!/usr/bin/node
+// Simple systemd unit generator for etcd
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: MIT
+
+// USAGE:
+// 1) Put the same etcd_address into /etc/vitastor/vitastor.conf on all monitor nodes
+// 2) Run ./make-etcd.js. It will create the etcd service on one of specified IPs
+
+const child_process = require('child_process');
+const fs = require('fs');
+const os = require('os');
+
+run().catch(e => { console.error(e); process.exit(1); });
+
+async function run()
+{
+    const config_path = process.argv[2] || '/etc/vitastor/vitastor.conf';
+    if (config_path == '-h' || config_path == '--help')
+    {
+        console.log(
+            'Initialize systemd etcd service for Vitastor\n'+
+            '(c) Vitaliy Filippov, 2019+ (MIT)\n'+
+            '\n'+
+            'USAGE:\n'+
+            '1) Put the same etcd_address into /etc/vitastor/vitastor.conf on all monitor nodes\n'+
+            '2) Run '+process.argv[1]+' [config_path]\n'
+        );
+        process.exit(0);
+    }
+    if (!fs.existsSync(config_path))
+    {
+        console.log(config_path+' is missing');
+        process.exit(1);
+    }
+    if (fs.existsSync("/etc/systemd/system/etcd.service"))
+    {
+        console.log("/etc/systemd/system/etcd.service already exists");
+        process.exit(1);
+    }
+    const config = JSON.parse(fs.readFileSync(config_path, { encoding: 'utf-8' }));
+    if (!config.etcd_address)
+    {
+        console.log("etcd_address is missing in "+config_path);
+        process.exit(1);
+    }
+    const etcds = (config.etcd_address instanceof Array ? config.etcd_address : (''+config.etcd_address).split(/,/))
+        .map(s => (''+s).replace(/^https?:\/\/\[?|\]?(:\d+)?(\/.*)?$/g, '').toLowerCase());
+    const num = select_local_etcd(etcds);
+    if (num < 0)
+    {
+        console.log('No matching IPs in etcd_address from '+config_path);
+        process.exit(0);
+    }
+    const etcd_cluster = etcds.map((e, i) => `etcd${i}=http://${e}:2380`).join(',');
+    await system(`mkdir -p /var/lib/etcd${num}.etcd`);
+    fs.writeFileSync(
+        "/etc/systemd/system/etcd.service",
+`[Unit]
+Description=etcd for vitastor
+After=network-online.target local-fs.target time-sync.target
+Wants=network-online.target local-fs.target time-sync.target
+
+[Service]
+Restart=always
+ExecStart=/usr/local/bin/etcd -name etcd${num} --data-dir /var/lib/etcd${num}.etcd \\
+    --advertise-client-urls http://${etcds[num]}:2379 --listen-client-urls http://${etcds[num]}:2379 \\
+    --initial-advertise-peer-urls http://${etcds[num]}:2380 --listen-peer-urls http://${etcds[num]}:2380 \\
+    --initial-cluster-token vitastor-etcd-1 --initial-cluster ${etcd_cluster} \\
+    --initial-cluster-state new --max-txn-ops=100000 --max-request-bytes=104857600 \\
+    --auto-compaction-retention=10 --auto-compaction-mode=revision
+WorkingDirectory=/var/lib/etcd${num}.etcd
+ExecStartPre=+chown -R etcd /var/lib/etcd${num}.etcd
+User=etcd
+PrivateTmp=false
+TasksMax=infinity
+Restart=always
+StartLimitInterval=0
+RestartSec=10
+
+[Install]
+WantedBy=local.target
+`);
+    await system(`useradd etcd`);
+    await system(`systemctl daemon-reload`);
+    await system(`systemctl enable etcd`);
+    await system(`systemctl start etcd`);
+    process.exit(0);
+}
+
+function select_local_etcd(etcds)
+{
+    const ifaces = os.networkInterfaces();
+    for (const ifname in ifaces)
+        for (const iface of ifaces[ifname])
+            for (let i = 0; i < etcds.length; i++)
+                if (etcds[i] == iface.address.toLowerCase())
+                    return i;
+    return -1;
+}
+
+async function system(cmd)
+{
+    const cp = child_process.spawn(cmd, { shell: true, stdio: [ 0, 1, 2 ] });
+    let finish_cb;
+    cp.on('exit', () => finish_cb && finish_cb());
+    if (cp.exitCode == null)
+        await new Promise(ok => finish_cb = ok);
+    return cp.exitCode;
+}
--- a/mon/make-osd-hybrid.js
+++ b/mon/make-osd-hybrid.js
@@ -1,414 +0,0 @@
-#!/usr/bin/nodejs
-// systemd unit generator for hybrid (HDD+SSD) vitastor OSDs
-// Copyright (c) Vitaliy Filippov, 2019+
-// License: VNPL-1.1
-
-// USAGE: nodejs make-osd-hybrid.js [--disable_ssd_cache 0] [--disable_hdd_cache 0] /dev/sda /dev/sdb /dev/sdc /dev/sdd ...
-// I.e. - just pass all HDDs and SSDs mixed, the script will decide where
-// to put journals on its own
-
-const fs = require('fs');
-const fsp = fs.promises;
-const child_process = require('child_process');
-
-const options = {
-    debug: 1,
-    journal_size: 1024*1024*1024,
-    min_meta_size: 1024*1024*1024,
-    object_size: 1024*1024,
-    bitmap_granularity: 4096,
-    device_block_size: 4096,
-    disable_ssd_cache: 1,
-    disable_hdd_cache: 1,
-};
-
-run().catch(console.fatal);
-
-async function run()
-{
-    const device_list = parse_options();
-    await system_or_die("mkdir -p /var/log/vitastor; chown vitastor /var/log/vitastor");
-    // Collect devices
-    const all_devices = await collect_devices(device_list);
-    const ssds = all_devices.filter(d => d.ssd);
-    const hdds = all_devices.filter(d => !d.ssd);
-    // Collect existing OSD units
-    const osd_units = await collect_osd_units();
-    // Count assigned HDD journals and unallocated space for each SSD
-    await check_journal_count(ssds, osd_units);
-    // Create new OSDs
-    await create_new_hybrid_osds(hdds, ssds, osd_units);
-    process.exit(0);
-}
-
-function parse_options()
-{
-    const devices = [];
-    const opt = {};
-    for (let i = 2; i < process.argv.length; i++)
-    {
-        const arg = process.argv[i];
-        if (arg == '--help' || arg == '-h')
-        {
-            opt.help = true;
-            break;
-        }
-        else if (arg.substr(0, 2) == '--')
-            opt[arg.substr(2)] = process.argv[++i];
-        else
-            devices.push(arg);
-    }
-    if (opt.help || !devices.length)
-    {
-        console.log(
-            'Prepare hybrid (HDD+SSD) Vitastor OSDs\n'+
-            '(c) Vitaliy Filippov, 2019+, license: VNPL-1.1\n\n'+
-            'USAGE: nodejs make-osd-hybrid.js [OPTIONS] /dev/sda /dev/sdb /dev/sdc ...\n'+
-            'Just pass all your SSDs and HDDs in any order, the script will distribute OSDs for you.\n\n'+
-            'OPTIONS (with defaults):\n'+
-            Object.keys(options).map(k => `  --${k} ${options[k]}`).join('\n')
-        );
-        process.exit(0);
-    }
-    for (const k in opt)
-        options[k] = opt[k];
-    return devices;
-}
-
-// Collect devices
-async function collect_devices(devices_to_check)
-{
-    const devices = [];
-    for (const dev of devices_to_check)
-    {
-        if (dev.substr(0, 5) != '/dev/')
-        {
-            console.log(`${dev} does not start with /dev/, skipping`);
-            continue;
-        }
-        if (!await file_exists('/sys/block/'+dev.substr(5)))
-        {
-            console.log(`${dev} is a partition, skipping`);
-            continue;
-        }
-        // Check if the device is an SSD
-        const rot = '/sys/block/'+dev.substr(5)+'/queue/rotational';
-        if (!await file_exists(rot))
-        {
-            console.log(`${dev} does not have ${rot} to check whether it's an SSD, skipping`);
-            continue;
-        }
-        const ssd = !parseInt(await fsp.readFile(rot, { encoding: 'utf-8' }));
-        // Check if the device has partition table
-        let [ has_partition_table, parts ] = await system(`sfdisk --dump ${dev} --json`);
-        if (has_partition_table != 0)
-        {
-            // Check if the device has any data
-            const [ has_data, out ] = await system(`blkid ${dev}`);
-            if (has_data == 0)
-            {
-                console.log(`${dev} contains data, skipping:\n  ${out.trim().replace(/\n/g, '\n  ')}`);
-                continue;
-            }
-        }
-        parts = parts ? JSON.parse(parts).partitiontable : null;
-        if (parts && parts.label != 'gpt')
-        {
-            console.log(`${dev} contains "${parts.label}" partition table, only GPT is supported, skipping`);
-            continue;
-        }
-        devices.push({
-            path: dev,
-            ssd,
-            parts,
-        });
-    }
-    return devices;
-}
-
-// Collect existing OSD units
-async function collect_osd_units()
-{
-    const units = [];
-    for (const unit of (await system("ls /etc/systemd/system/vitastor-osd*.service"))[1].trim().split('\n'))
-    {
-        if (!unit)
-        {
-            continue;
-        }
-        let cmd = /^ExecStart\s*=\s*(([^\n]*\\\n)*[^\n]*)/.exec(await fsp.readFile(unit, { encoding: 'utf-8' }));
-        if (!cmd)
-        {
-            console.log('ExecStart= not found in '+unit+', skipping')
-            continue;
-        }
-        let kv = {}, key;
-        cmd = cmd[1].replace(/^bash\s+-c\s+'/, '')
-            .replace(/>>\s*\S+2>\s*&1\s*'$/, '')
-            .replace(/\s*\\\n\s*/g, ' ')
-            .replace(/([^\s']+)|'([^']+)'/g, (m, m1, m2) =>
-            {
-                m1 = m1||m2;
-                if (key == null)
-                {
-                    if (m1.substr(0, 2) != '--')
-                    {
-                        console.log('Strange command line in '+unit+', stopping');
-                        process.exit(1);
-                    }
-                    key = m1.substr(2);
-                }
-                else
-                {
-                    kv[key] = m1;
-                    key = null;
-                }
-            });
-        units.push(kv);
-    }
-    return units;
-}
-
-// Count assigned HDD journals and unallocated space for each SSD
-async function check_journal_count(ssds, osd_units)
-{
-    const units_by_journal = osd_units.reduce((a, c) =>
-    {
-        if (c.journal_device)
-            a[c.journal_device] = c;
-        return a;
-    }, {});
-    for (const dev of ssds)
-    {
-        dev.journals = 0;
-        if (dev.parts)
-        {
-            for (const part of dev.parts.partitions)
-            {
-                if (part.uuid && units_by_journal['/dev/disk/by-partuuid/'+part.uuid.toLowerCase()])
-                {
-                    dev.journals++;
-                }
-            }
-            dev.free = free_from_parttable(dev.parts);
-        }
-        else
-        {
-            dev.free = parseInt(await system_or_die("blockdev --getsize64 "+dev.path));
-        }
-    }
-}
-
-async function create_new_hybrid_osds(hdds, ssds, osd_units)
-{
-    const units_by_disk = osd_units.reduce((a, c) => { a[c.data_device] = c; return a; }, {});
-    for (const dev of hdds)
-    {
-        if (!dev.parts)
-        {
-            // HDD is not partitioned yet, create a single partition
-            // + is the "default value" for sfdisk
-            await system_or_die('sfdisk '+dev.path, 'label: gpt\n\n+ +\n');
-            dev.parts = JSON.parse(await system_or_die('sfdisk --dump '+dev.path+' --json')).partitiontable;
-        }
-        if (dev.parts.partitions.length != 1)
-        {
-            console.log(dev.path+' has more than 1 partition, skipping');
-        }
-        else if ((dev.parts.partitions[0].start + dev.parts.partitions[0].size) != (1 + dev.parts.lastlba))
-        {
-            console.log(dev.path+'1 is not a whole-disk partition, skipping');
-        }
-        else if (!dev.parts.partitions[0].uuid)
-        {
-            console.log(dev.parts.partitions[0].node+' does not have UUID. Please repartition '+dev.path+' with GPT');
-        }
-        else if (!units_by_disk['/dev/disk/by-partuuid/'+dev.parts.partitions[0].uuid.toLowerCase()])
-        {
-            await create_hybrid_osd(dev, ssds);
-        }
-    }
-}
-
-async function create_hybrid_osd(dev, ssds)
-{
-    // Create a new OSD
-    // Calculate metadata size
-    const data_device = '/dev/disk/by-partuuid/'+dev.parts.partitions[0].uuid.toLowerCase();
-    const data_size = dev.parts.partitions[0].size * dev.parts.sectorsize;
-    const meta_entry_size = 24 + 2*options.object_size/options.bitmap_granularity/8;
-    const entries_per_block = Math.floor(options.device_block_size / meta_entry_size);
-    const object_count = Math.floor(data_size / options.object_size);
-    let meta_size = Math.ceil(1 + object_count / entries_per_block) * options.device_block_size;
-    // Leave some extra space for future metadata formats and round metadata area size to multiples of 1 MB
-    meta_size = 2*meta_size;
-    meta_size = Math.ceil(meta_size/1024/1024) * 1024*1024;
-    if (meta_size < options.min_meta_size)
-        meta_size = options.min_meta_size;
-    let journal_size = Math.ceil(options.journal_size/1024/1024) * 1024*1024;
-    // Pick an SSD for journal, balancing the number of journals across SSDs
-    let selected_ssd;
-    for (const ssd of ssds)
-        if (ssd.free >= (meta_size+journal_size) && (!selected_ssd || selected_ssd.journals > ssd.journals))
-            selected_ssd = ssd;
-    if (!selected_ssd)
-    {
-        console.error('Could not find free space for SSD journal and metadata for '+dev.path);
-        process.exit(1);
-    }
-    // Allocate an OSD number
-    const osd_num = (await system_or_die("vitastor-cli alloc-osd")).trim();
-    if (!osd_num)
-    {
-        console.error('Failed to run vitastor-cli alloc-osd');
-        process.exit(1);
-    }
-    console.log('Creating OSD '+osd_num+' on '+dev.path+' (HDD) with journal and metadata on '+selected_ssd.path+' (SSD)');
-    // Add two partitions: journal and metadata
-    const new_parts = await add_partitions(selected_ssd, [ journal_size, meta_size ]);
-    selected_ssd.journals++;
-    const journal_device = '/dev/disk/by-partuuid/'+new_parts[0].uuid.toLowerCase();
-    const meta_device = '/dev/disk/by-partuuid/'+new_parts[1].uuid.toLowerCase();
-    // Wait until the device symlinks appear
-    while (!await file_exists(journal_device))
-    {
-        await new Promise(ok => setTimeout(ok, 100));
-    }
-    while (!await file_exists(meta_device))
-    {
-        await new Promise(ok => setTimeout(ok, 100));
-    }
-    // Zero out metadata and journal
-    await system_or_die("dd if=/dev/zero of="+journal_device+" bs=1M count="+(journal_size/1024/1024)+" oflag=direct");
-    await system_or_die("dd if=/dev/zero of="+meta_device+" bs=1M count="+(meta_size/1024/1024)+" oflag=direct");
-    // Create unit file for the OSD
-    const has_scsi_cache_type = options.disable_ssd_cache &&
-        (await system("ls /sys/block/"+selected_ssd.path.substr(5)+"/device/scsi_disk/*/cache_type"))[0] == 0;
-    const write_through = options.disable_ssd_cache && (
-        has_scsi_cache_type || selected_ssd.path.substr(5, 4) == 'nvme'
-        && (await system_or_die("/sys/block/"+selected_ssd.path.substr(5)+"/queue/write_cache")).trim() == "write through");
-    await fsp.writeFile('/etc/systemd/system/vitastor-osd'+osd_num+'.service',
-`[Unit]
-Description=Vitastor object storage daemon osd.${osd_num}
-After=network-online.target local-fs.target time-sync.target
-Wants=network-online.target local-fs.target time-sync.target
-PartOf=vitastor.target
-
-[Service]
-LimitNOFILE=1048576
-LimitNPROC=1048576
-LimitMEMLOCK=infinity
-ExecStart=bash -c '/usr/bin/vitastor-osd \\
-    --osd_num ${osd_num} ${write_through
-        ? "--disable_meta_fsync 1 --disable_journal_fsync 1 --immediate_commit "+(options.disable_hdd_cache ? "all" : "small")
-        : ""} \\
-    --throttle_small_writes 1 \\
-    --disk_alignment ${options.device_block_size} \\
-    --journal_block_size ${options.device_block_size} \\
-    --meta_block_size ${options.device_block_size} \\
-    --journal_no_same_sector_overwrites true \\
-    --journal_sector_buffer_count 1024 \\
-    --block_size ${options.object_size} \\
-    --data_device ${data_device} \\
-    --journal_device ${journal_device} \\
-    --meta_device ${meta_device} >>/var/log/vitastor/osd${osd_num}.log 2>&1'
-WorkingDirectory=/
-ExecStartPre=+chown vitastor:vitastor ${data_device}
-ExecStartPre=+chown vitastor:vitastor ${journal_device}
-ExecStartPre=+chown vitastor:vitastor ${meta_device}${
-    has_scsi_cache_type
-    ? "\nExecStartPre=+bash -c 'D=$$$(readlink "+journal_device+"); echo write through > $$$(dirname /sys/block/*/$$\${D##*/})/device/scsi_disk/*/cache_type'"
-    : ""}${
-    options.disable_hdd_cache
-    ? "\nExecStartPre=+bash -c 'D=$$$(readlink "+data_device+"); echo write through > $$$(dirname /sys/block/*/$$\${D##*/})/device/scsi_disk/*/cache_type'"
-    : ""}
-User=vitastor
-PrivateTmp=false
-TasksMax=infinity
-Restart=always
-StartLimitInterval=0
-RestartSec=10
-
-[Install]
-WantedBy=vitastor.target
-`);
-    await system_or_die("systemctl enable vitastor-osd"+osd_num);
-}
-
-async function add_partitions(dev, sizes)
-{
-    let script = 'label: gpt\n\n';
-    if (dev.parts)
-    {
-        // Old partitions
-        for (const part of dev.parts.partitions)
-        {
-            script += part.node+': '+Object.keys(part).map(k => k == 'node' ? '' : k+'='+part[k]).filter(k => k).join(', ')+'\n';
-        }
-    }
-    // New partitions
-    for (const size of sizes)
-    {
-        script += '+ '+Math.ceil(size/1024)+'KiB\n';
-    }
-    await system_or_die('sfdisk '+dev.path, script);
-    // Get new partition table and find the new partition
-    const newpt = JSON.parse(await system_or_die('sfdisk --dump '+dev.path+' --json')).partitiontable;
-    const old_nodes = dev.parts ? dev.parts.partitions.reduce((a, c) => { a[c.uuid] = true; return a; }, {}) : {};
-    const new_nodes = newpt.partitions.filter(part => !old_nodes[part.uuid]);
-    if (new_nodes.length != sizes.length)
-    {
-        console.error('Failed to partition '+dev.path+': new partitions not found in table');
-        process.exit(1);
-    }
-    dev.parts = newpt;
-    dev.free = free_from_parttable(newpt);
-    return new_nodes;
-}
-
-function free_from_parttable(pt)
-{
-    let free = pt.lastlba + 1 - pt.firstlba;
-    for (const part of pt.partitions)
-    {
-        free -= part.size;
-    }
-    free *= pt.sectorsize;
-    return free;
-}
-
-async function system_or_die(cmd, input = '')
-{
-    let [ exitcode, stdout, stderr ] = await system(cmd, input);
-    if (exitcode != 0)
-    {
-        console.error(cmd+' failed: '+stderr);
-        process.exit(1);
-    }
-    return stdout;
-}
-
-async function system(cmd, input = '')
-{
-    if (options.debug)
-    {
-        process.stderr.write('+ '+cmd+(input ? " <<EOF\n"+input.replace(/\s*$/, '\n')+"EOF" : '')+'\n');
-    }
-    const cp = child_process.spawn(cmd, { shell: true });
-    let stdout = '', stderr = '', finish_cb;
-    cp.stdout.on('data', buf => stdout += buf.toString());
-    cp.stderr.on('data', buf => stderr += buf.toString());
-    cp.on('exit', () => finish_cb && finish_cb());
-    cp.stdin.write(input);
-    cp.stdin.end();
-    if (cp.exitCode == null)
-    {
-        await new Promise(ok => finish_cb = ok);
-    }
-    return [ cp.exitCode, stdout, stderr ];
-}
-
-async function file_exists(filename)
-{
-    return new Promise((ok, no) => fs.access(filename, fs.constants.R_OK, err => ok(!err)));
-}
--- a/mon/make-osd.sh
+++ b/mon/make-osd.sh
@@ -1,66 +0,0 @@
-#!/bin/bash
-# Very simple systemd unit generator for vitastor-osd services
-# Not the final solution yet, mostly for tests
-# Copyright (c) Vitaliy Filippov, 2019+
-# License: MIT
-
-# USAGE:
-# 1) Put etcd_address and osd_network into /etc/vitastor/vitastor.conf. Example:
-#    {
-#        "etcd_address":["http://10.200.1.10:2379/v3","http://10.200.1.11:2379/v3","http://10.200.1.12:2379/v3"],
-#        "osd_network":"10.200.1.0/24"
-#    }
-# 2) Run ./make-osd.sh /dev/disk/by-partuuid/xxx [ /dev/disk/by-partuuid/yyy]...
-
-set -e -x
-
-# Create OSDs on all passed devices
-for DEV in $*; do
-
-OSD_NUM=$(vitastor-cli alloc-osd)
-
-echo Creating OSD $OSD_NUM on $DEV
-
-OPT=$(vitastor-cli simple-offsets --format options $DEV | tr '\n' ' ')
-META=$(vitastor-cli simple-offsets --format json $DEV | jq .data_offset)
-dd if=/dev/zero of=$DEV bs=1048576 count=$(((META+1048575)/1048576)) oflag=direct
-
-mkdir -p /var/log/vitastor
-id vitastor &>/dev/null || useradd vitastor
-chown vitastor /var/log/vitastor
-
-cat >/etc/systemd/system/vitastor-osd$OSD_NUM.service <<EOF
-[Unit]
-Description=Vitastor object storage daemon osd.$OSD_NUM
-After=network-online.target local-fs.target time-sync.target
-Wants=network-online.target local-fs.target time-sync.target
-PartOf=vitastor.target
-
-[Service]
-LimitNOFILE=1048576
-LimitNPROC=1048576
-LimitMEMLOCK=infinity
-ExecStart=bash -c '/usr/bin/vitastor-osd \\
-    --osd_num $OSD_NUM \\
-    --disable_data_fsync 1 \\
-    --immediate_commit all \\
-    --disk_alignment 4096 --journal_block_size 4096 --meta_block_size 4096 \\
-    --journal_no_same_sector_overwrites true \\
-    --journal_sector_buffer_count 1024 \\
-    $OPT >>/var/log/vitastor/osd$OSD_NUM.log 2>&1'
-WorkingDirectory=/
-ExecStartPre=+chown vitastor:vitastor $DEV
-User=vitastor
-PrivateTmp=false
-TasksMax=infinity
-Restart=always
-StartLimitInterval=0
-RestartSec=10
-
-[Install]
-WantedBy=vitastor.target
-EOF
-
-systemctl enable vitastor-osd$OSD_NUM
-
-done
--- a/mon/make-units.sh
+++ b/mon/make-units.sh
@@ -1,86 +0,0 @@
-#!/bin/bash
-# Very simple systemd unit generator for etcd & vitastor-mon services
-# Not the final solution yet, mostly for tests
-# Copyright (c) Vitaliy Filippov, 2019+
-# License: MIT
-
-# USAGE: ./make-units.sh
-
-IP_SUBSTR="10.200.1."
-ETCD_HOSTS="etcd0=http://10.200.1.10:2380,etcd1=http://10.200.1.11:2380,etcd2=http://10.200.1.12:2380"
-
-# determine IP
-IP=`ip -json a s | jq -r '.[].addr_info[] | select(.local | startswith("'$IP_SUBSTR'")) | .local'`
-[ "$IP" != "" ] || exit 1
-ETCD_NUM=${ETCD_HOSTS/$IP*/}
-[ "$ETCD_NUM" != "$ETCD_HOSTS" ] || exit 1
-ETCD_NUM=$(echo $ETCD_NUM | tr -d -c , | wc -c)
-
-# etcd
-useradd etcd
-
-mkdir -p /var/lib/etcd$ETCD_NUM.etcd
-cat >/etc/systemd/system/etcd.service <<EOF
-[Unit]
-Description=etcd for vitastor
-After=network-online.target local-fs.target time-sync.target
-Wants=network-online.target local-fs.target time-sync.target
-
-[Service]
-Restart=always
-ExecStart=/usr/local/bin/etcd -name etcd$ETCD_NUM --data-dir /var/lib/etcd$ETCD_NUM.etcd \\
-    --advertise-client-urls http://$IP:2379 --listen-client-urls http://$IP:2379 \\
-    --initial-advertise-peer-urls http://$IP:2380 --listen-peer-urls http://$IP:2380 \\
-    --initial-cluster-token vitastor-etcd-1 --initial-cluster $ETCD_HOSTS \\
-    --initial-cluster-state new --max-txn-ops=100000 --max-request-bytes=104857600 \\
-    --auto-compaction-retention=10 --auto-compaction-mode=revision
-WorkingDirectory=/var/lib/etcd$ETCD_NUM.etcd
-ExecStartPre=+chown -R etcd /var/lib/etcd$ETCD_NUM.etcd
-User=etcd
-PrivateTmp=false
-TasksMax=infinity
-Restart=always
-StartLimitInterval=0
-RestartSec=10
-
-[Install]
-WantedBy=local.target
-EOF
-
-systemctl daemon-reload
-systemctl enable etcd
-systemctl start etcd
-
-useradd vitastor
-chmod 755 /root
-
-# Vitastor target
-cat >/etc/systemd/system/vitastor.target <<EOF
-[Unit]
-Description=vitastor target
-[Install]
-WantedBy=multi-user.target
-EOF
-
-# Monitor unit
-ETCD_MON=$(echo $ETCD_HOSTS | perl -pe 's/:2380/:2379/g; s/etcd\d*=//g;')
-cat >/etc/systemd/system/vitastor-mon.service <<EOF
-[Unit]
-Description=Vitastor monitor
-After=network-online.target local-fs.target time-sync.target
-Wants=network-online.target local-fs.target time-sync.target
-
-[Service]
-Restart=always
-ExecStart=node /usr/lib/vitastor/mon/mon-main.js --etcd_url '$ETCD_MON' --etcd_prefix '/vitastor' --etcd_start_timeout 5
-WorkingDirectory=/
-User=vitastor
-PrivateTmp=false
-TasksMax=infinity
-Restart=always
-StartLimitInterval=0
-RestartSec=10
-
-[Install]
-WantedBy=vitastor.target
-EOF
--- a/mon/mon-main.js
+++ b/mon/mon-main.js
@@ -12,7 +12,7 @@ for (let i = 2; i < process.argv.length; i++)
    if (process.argv[i] === '-h' || process.argv[i] === '--help')
    {
        console.error('USAGE: '+process.argv[0]+' '+process.argv[1]+' [--verbose 1]'+
-            ' [--etcd_address "http://127.0.0.1:2379,..."] [--config_file /etc/vitastor/vitastor.conf]'+
+            ' [--etcd_address "http://127.0.0.1:2379,..."] [--config_path /etc/vitastor/vitastor.conf]'+
            ' [--etcd_prefix "/vitastor"] [--etcd_start_timeout 5]');
        process.exit();
    }
--- a/mon/mon.js
+++ b/mon/mon.js
@@ -157,7 +157,12 @@ const etcd_tree = {
                pg_count: 100,
                failure_domain: 'host',
                max_osd_combinations: 10000,
-                pg_stripe_size: 4194304,
+                // block_size, bitmap_granularity, immediate_commit must match all OSDs used in that pool
+                block_size: 131072,
+                bitmap_granularity: 4096,
+                // 'all'/'small'/'none', same as in OSD options
+                immediate_commit: 'none',
+                pg_stripe_size: 0,
                root_node?: 'rack1',
                // restrict pool to OSDs having all of these tags
                osd_tags?: 'nvme' | [ 'nvme', ... ],
@@ -323,6 +328,13 @@ const etcd_tree = {
            misplaced: uint64_t,
            degraded: uint64_t,
            incomplete: uint64_t,
+        },
+        object_bytes: {
+            total: uint64_t,
+            clean: uint64_t,
+            misplaced: uint64_t,
+            degraded: uint64_t,
+            incomplete: uint64_t,
        }, */
    },
    history: {
@@ -1438,8 +1450,24 @@ class Mon
    sum_object_counts()
    {
        const object_counts = { object: 0n, clean: 0n, misplaced: 0n, degraded: 0n, incomplete: 0n };
+        const object_bytes = { object: 0n, clean: 0n, misplaced: 0n, degraded: 0n, incomplete: 0n };
        for (const pool_id in this.state.pg.stats)
        {
+            let object_size = 0;
+            for (const osd_num of this.state.pg.stats[pool_id].write_osd_set||[])
+            {
+                if (osd_num && this.state.osd.stats[osd_num] && this.state.osd.stats[osd_num].block_size)
+                {
+                    object_size = this.state.osd.stats[osd_num].block_size;
+                    break;
+                }
+            }
+            if (!object_size)
+            {
+                object_size = (this.state.config.pools[pool_id]||{}).block_size ||
+                    this.config.block_size || 131072;
+            }
+            object_size = BigInt(object_size);
            for (const pg_num in this.state.pg.stats[pool_id])
            {
                const st = this.state.pg.stats[pool_id][pg_num];
@@ -1450,12 +1478,13 @@ class Mon
                        if (st[k+'_count'])
                        {
                            object_counts[k] += BigInt(st[k+'_count']);
+                            object_bytes[k] += BigInt(st[k+'_count']) * object_size;
                        }
                    }
                }
            }
        }
-        return object_counts;
+        return { object_counts, object_bytes };
    }

    sum_inode_stats(prev_stats, timestamp, prev_timestamp)
@@ -1568,7 +1597,7 @@ class Mon
    {
        const txn = [];
        const timestamp = Date.now();
-        const object_counts = this.sum_object_counts();
+        const { object_counts, object_bytes } = this.sum_object_counts();
        let stats = this.sum_op_stats(timestamp, this.prev_stats);
        let inode_stats = this.sum_inode_stats(
            this.prev_stats ? this.prev_stats.inode_stats : null,
@@ -1576,6 +1605,7 @@ class Mon
        );
        this.prev_stats = { timestamp, ...stats, inode_stats };
        stats.object_counts = object_counts;
+        stats.object_bytes = object_bytes;
        stats = this.serialize_bigints(stats);
        inode_stats = this.serialize_bigints(inode_stats);
        txn.push({ requestPut: { key: b64(this.etcd_prefix+'/stats'), value: b64(JSON.stringify(stats)) } });
--- a/mon/vitastor-mon.service
+++ b/mon/vitastor-mon.service
@@ -0,0 +1,18 @@
+[Unit]
+Description=Vitastor monitor
+After=network-online.target local-fs.target time-sync.target
+Wants=network-online.target local-fs.target time-sync.target
+
+[Service]
+Restart=always
+ExecStart=node /usr/lib/vitastor/mon/mon-main.js
+WorkingDirectory=/
+User=vitastor
+PrivateTmp=false
+TasksMax=infinity
+Restart=always
+StartLimitInterval=0
+RestartSec=10
+
+[Install]
+WantedBy=vitastor.target
--- a/mon/vitastor-osd@.service
+++ b/mon/vitastor-osd@.service
@@ -0,0 +1,22 @@
+[Unit]
+Description=Vitastor object storage daemon osd.%i
+After=network-online.target local-fs.target time-sync.target
+Wants=network-online.target local-fs.target time-sync.target
+PartOf=vitastor.target
+
+[Service]
+LimitNOFILE=1048576
+LimitNPROC=1048576
+LimitMEMLOCK=infinity
+ExecStart=bash -c 'exec vitastor-disk exec-osd /dev/vitastor/osd%i-data >>/var/log/vitastor/osd%i.log 2>&1'
+ExecStartPre=+vitastor-disk pre-exec /dev/vitastor/osd%i-data
+WorkingDirectory=/
+User=vitastor
+PrivateTmp=false
+TasksMax=infinity
+Restart=always
+StartLimitInterval=0
+RestartSec=10
+
+[Install]
+WantedBy=vitastor.target
--- a/mon/vitastor.target
+++ b/mon/vitastor.target
@@ -0,0 +1,4 @@
+[Unit]
+Description=vitastor target
+[Install]
+WantedBy=multi-user.target
--- a/patches/cinder-vitastor.py
+++ b/patches/cinder-vitastor.py
@@ -50,7 +50,7 @@ from cinder.volume import configuration
 from cinder.volume import driver
 from cinder.volume import volume_utils

-VERSION = '0.7.1'
+VERSION = '0.8.1'

 LOG = logging.getLogger(__name__)

@@ -464,7 +464,7 @@ class VitastorDriver(driver.CloneableImageVD,
        vol_name = utils.convert_str(volume.name)
        snap_name = utils.convert_str(snapshot.name)

-        snap = self._get_image(vol_name+'@'+snap_name)
+        snap = self._get_image('volume-'+snapshot.volume_id+'@'+snap_name)
        if not snap:
            raise exception.SnapshotNotFound(snapshot_id = snap_name)
        snap_inode_id = int(resp['responses'][0]['kvs'][0]['value']['id'])
--- a/rpm/build-tarball.sh
+++ b/rpm/build-tarball.sh
@@ -25,4 +25,4 @@ rm fio
 mv fio-copy fio
 FIO=`rpm -qi fio | perl -e 'while(<>) { /^Epoch[\s:]+(\S+)/ && print "$1:"; /^Version[\s:]+(\S+)/ && print $1; /^Release[\s:]+(\S+)/ && print "-$1"; }'`
 perl -i -pe 's/(Requires:\s*fio)([^\n]+)?/$1 = '$FIO'/' $VITASTOR/rpm/vitastor-el$EL.spec
-tar --transform 's#^#vitastor-0.7.1/#' --exclude 'rpm/*.rpm' -czf $VITASTOR/../vitastor-0.7.1$(rpm --eval '%dist').tar.gz *
+tar --transform 's#^#vitastor-0.8.1/#' --exclude 'rpm/*.rpm' -czf $VITASTOR/../vitastor-0.8.1$(rpm --eval '%dist').tar.gz *
--- a/rpm/qemu-kvm-4.2-el7.spec.patch
+++ b/rpm/qemu-kvm-4.2-el7.spec.patch
@@ -58,7 +58,7 @@
 +BuildRequires: gperftools-devel
 +BuildRequires: libusbx-devel >= 1.0.21
 %if %{have_usbredir}
- BuildRequires: usbredir-devel >= 0.7.1
+ BuildRequires: usbredir-devel >= 0.8.1
 %endif
@@ -856,12 +861,13 @@ BuildRequires: virglrenderer-devel
 # For smartcard NSS support
--- a/rpm/vitastor-el7.Dockerfile
+++ b/rpm/vitastor-el7.Dockerfile
@@ -35,7 +35,7 @@ ADD . /root/vitastor
 RUN set -e; \
    cd /root/vitastor/rpm; \
    sh build-tarball.sh; \
-    cp /root/vitastor-0.7.1.el7.tar.gz ~/rpmbuild/SOURCES; \
+    cp /root/vitastor-0.8.1.el7.tar.gz ~/rpmbuild/SOURCES; \
    cp vitastor-el7.spec ~/rpmbuild/SPECS/vitastor.spec; \
    cd ~/rpmbuild/SPECS/; \
    rpmbuild -ba vitastor.spec; \
--- a/rpm/vitastor-el7.spec
+++ b/rpm/vitastor-el7.spec
@@ -1,11 +1,11 @@
 Name:           vitastor
-Version:        0.7.1
+Version:        0.8.1
 Release:        1%{?dist}
 Summary:        Vitastor, a fast software-defined clustered block storage

 License:        Vitastor Network Public License 1.1
 URL:            https://vitastor.io/
-Source0:        vitastor-0.7.1.el7.tar.gz
+Source0:        vitastor-0.8.1.el7.tar.gz

 BuildRequires:  liburing-devel >= 0.6
 BuildRequires:  gperftools-devel
@@ -36,6 +36,8 @@ Requires:       libJerasure2
 Requires:       libisa-l
 Requires:       liburing >= 0.6
 Requires:       vitastor-client = %{version}-%{release}
+Requires:       util-linux
+Requires:       parted


 %description -n vitastor-osd
@@ -102,8 +104,11 @@ cd mon
 npm install
 cd ..
 mkdir -p %buildroot/usr/lib/vitastor
-cp mon/make-osd.sh %buildroot/usr/lib/vitastor
 cp -r mon %buildroot/usr/lib/vitastor
+mkdir -p %buildroot/lib/systemd/system
+cp mon/vitastor.target mon/vitastor-mon.service mon/vitastor-osd@.service %buildroot/lib/systemd/system
+mkdir -p %buildroot/lib/udev/rules.d
+cp mon/90-vitastor.rules %buildroot/lib/udev/rules.d


 %files
@@ -112,11 +117,29 @@ cp -r mon %buildroot/usr/lib/vitastor

 %files -n vitastor-osd
 %_bindir/vitastor-osd
+%_bindir/vitastor-disk
 %_bindir/vitastor-dump-journal
+/lib/systemd/system/vitastor-osd@.service
+/lib/systemd/system/vitastor.target
+/lib/udev/rules.d/90-vitastor.rules
+
+
+%pre -n vitastor-osd
+groupadd -r -f vitastor 2>/dev/null ||:
+useradd -r -g vitastor -s /sbin/nologin -c "Vitastor daemons" -M -d /nonexistent vitastor 2>/dev/null ||:
+install -o vitastor -g vitastor -d /var/log/vitastor
+mkdir -p /etc/vitastor


 %files -n vitastor-mon
 /usr/lib/vitastor/mon
+/lib/systemd/system/vitastor-mon.service
+
+
+%pre -n vitastor-mon
+groupadd -r -f vitastor 2>/dev/null ||:
+useradd -r -g vitastor -s /sbin/nologin -c "Vitastor daemons" -M -d /nonexistent vitastor 2>/dev/null ||:
+mkdir -p /etc/vitastor


 %files -n vitastor-client
@@ -127,7 +150,6 @@ cp -r mon %buildroot/usr/lib/vitastor
 %_bindir/vita
 %_libdir/libvitastor_blk.so*
 %_libdir/libvitastor_client.so*
-/usr/lib/vitastor/make-osd.sh


 %files -n vitastor-client-devel
--- a/rpm/vitastor-el8.Dockerfile
+++ b/rpm/vitastor-el8.Dockerfile
@@ -35,7 +35,7 @@ ADD . /root/vitastor
 RUN set -e; \
    cd /root/vitastor/rpm; \
    sh build-tarball.sh; \
-    cp /root/vitastor-0.7.1.el8.tar.gz ~/rpmbuild/SOURCES; \
+    cp /root/vitastor-0.8.1.el8.tar.gz ~/rpmbuild/SOURCES; \
    cp vitastor-el8.spec ~/rpmbuild/SPECS/vitastor.spec; \
    cd ~/rpmbuild/SPECS/; \
    rpmbuild -ba vitastor.spec; \
--- a/rpm/vitastor-el8.spec
+++ b/rpm/vitastor-el8.spec
@@ -1,11 +1,11 @@
 Name:           vitastor
-Version:        0.7.1
+Version:        0.8.1
 Release:        1%{?dist}
 Summary:        Vitastor, a fast software-defined clustered block storage

 License:        Vitastor Network Public License 1.1
 URL:            https://vitastor.io/
-Source0:        vitastor-0.7.1.el8.tar.gz
+Source0:        vitastor-0.8.1.el8.tar.gz

 BuildRequires:  liburing-devel >= 0.6
 BuildRequires:  gperftools-devel
@@ -35,6 +35,8 @@ Requires:       libJerasure2
 Requires:       libisa-l
 Requires:       liburing >= 0.6
 Requires:       vitastor-client = %{version}-%{release}
+Requires:       util-linux
+Requires:       parted


 %description -n vitastor-osd
@@ -99,8 +101,11 @@ cd mon
 npm install
 cd ..
 mkdir -p %buildroot/usr/lib/vitastor
-cp mon/make-osd.sh %buildroot/usr/lib/vitastor
 cp -r mon %buildroot/usr/lib/vitastor
+mkdir -p %buildroot/lib/systemd/system
+cp mon/vitastor.target mon/vitastor-mon.service mon/vitastor-osd@.service %buildroot/lib/systemd/system
+mkdir -p %buildroot/lib/udev/rules.d
+cp mon/90-vitastor.rules %buildroot/lib/udev/rules.d


 %files
@@ -109,11 +114,29 @@ cp -r mon %buildroot/usr/lib/vitastor

 %files -n vitastor-osd
 %_bindir/vitastor-osd
+%_bindir/vitastor-disk
 %_bindir/vitastor-dump-journal
+/lib/systemd/system/vitastor-osd@.service
+/lib/systemd/system/vitastor.target
+/lib/udev/rules.d/90-vitastor.rules
+
+
+%pre -n vitastor-osd
+groupadd -r -f vitastor 2>/dev/null ||:
+useradd -r -g vitastor -s /sbin/nologin -c "Vitastor daemons" -M -d /nonexistent vitastor 2>/dev/null ||:
+install -o vitastor -g vitastor -d /var/log/vitastor
+mkdir -p /etc/vitastor


 %files -n vitastor-mon
 /usr/lib/vitastor/mon
+/lib/systemd/system/vitastor-mon.service
+
+
+%pre -n vitastor-mon
+groupadd -r -f vitastor 2>/dev/null ||:
+useradd -r -g vitastor -s /sbin/nologin -c "Vitastor daemons" -M -d /nonexistent vitastor 2>/dev/null ||:
+mkdir -p /etc/vitastor


 %files -n vitastor-client
@@ -124,7 +147,6 @@ cp -r mon %buildroot/usr/lib/vitastor
 %_bindir/vita
 %_libdir/libvitastor_blk.so*
 %_libdir/libvitastor_client.so*
-/usr/lib/vitastor/make-osd.sh


 %files -n vitastor-client-devel
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -15,7 +15,7 @@ if("${CMAKE_INSTALL_PREFIX}" MATCHES "^/usr/local/?$")
 	set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}")
 endif()

-add_definitions(-DVERSION="0.7.1")
+add_definitions(-DVERSION="0.8.1")
 add_definitions(-Wall -Wno-sign-compare -Wno-comment -Wno-parentheses -Wno-pointer-arith -fdiagnostics-color=always -I ${CMAKE_SOURCE_DIR}/src)
 if (${WITH_ASAN})
 	add_definitions(-fsanitize=address -fno-omit-frame-pointer)
@@ -64,7 +64,7 @@ include_directories(

 # libvitastor_blk.so
 add_library(vitastor_blk SHARED
-	allocator.cpp blockstore.cpp blockstore_impl.cpp blockstore_init.cpp blockstore_open.cpp blockstore_journal.cpp blockstore_read.cpp
+	allocator.cpp blockstore.cpp blockstore_impl.cpp blockstore_disk.cpp blockstore_init.cpp blockstore_open.cpp blockstore_journal.cpp blockstore_read.cpp
 	blockstore_write.cpp blockstore_sync.cpp blockstore_stable.cpp blockstore_rollback.cpp blockstore_flush.cpp crc32c.c ringloop.cpp
 )
 target_link_libraries(vitastor_blk
@@ -94,7 +94,7 @@ endif (IBVERBS_LIBRARIES)
 add_library(vitastor_common STATIC
 	epoll_manager.cpp etcd_state_client.cpp messenger.cpp addr_util.cpp
 	msgr_stop.cpp msgr_op.cpp msgr_send.cpp msgr_receive.cpp ringloop.cpp ../json11/json11.cpp
-	http_client.cpp osd_ops.cpp pg_states.cpp timerfd_manager.cpp base64.cpp ${MSGR_RDMA}
+	http_client.cpp osd_ops.cpp pg_states.cpp timerfd_manager.cpp str_util.cpp ${MSGR_RDMA}
 )
 target_compile_options(vitastor_common PUBLIC -fPIC)

@@ -131,7 +131,6 @@ add_library(vitastor_client SHARED
 	vitastor_c.cpp
 	cli_common.cpp
 	cli_alloc_osd.cpp
-	cli_simple_offsets.cpp
 	cli_status.cpp
 	cli_df.cpp
 	cli_ls.cpp
@@ -193,9 +192,15 @@ target_link_libraries(vitastor-cli
 )
 configure_file(vitastor.pc.in vitastor.pc @ONLY)

-# vitastor-dump-journal
-add_executable(vitastor-dump-journal
-	dump_journal.cpp crc32c.c
+# vitastor-disk
+add_executable(vitastor-disk
+	disk_tool.cpp disk_simple_offsets.cpp
+	disk_tool_journal.cpp disk_tool_meta.cpp disk_tool_prepare.cpp disk_tool_resize.cpp disk_tool_udev.cpp disk_tool_utils.cpp disk_tool_upgrade.cpp
+	crc32c.c str_util.cpp ../json11/json11.cpp rw_blocking.cpp allocator.cpp ringloop.cpp blockstore_disk.cpp
+)
+target_link_libraries(vitastor-disk
+	tcmalloc_minimal
+	${LIBURING_LIBRARIES}
 )

 if (${WITH_QEMU})
@@ -258,6 +263,14 @@ target_link_libraries(test_cas
 	vitastor_client
 )

+# test_crc32
+add_executable(test_crc32
+	test_crc32.cpp
+)
+target_link_libraries(test_crc32
+	vitastor_blk
+)
+
 # test_cluster_client
 add_executable(test_cluster_client
 	test_cluster_client.cpp
@@ -275,7 +288,8 @@ target_include_directories(test_cluster_client PUBLIC ${CMAKE_SOURCE_DIR}/src/mo

 ### Install

-install(TARGETS vitastor-osd vitastor-dump-journal vitastor-nbd vitastor-nfs vitastor-cli RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
+install(TARGETS vitastor-osd vitastor-disk vitastor-nbd vitastor-nfs vitastor-cli RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
+install_symlink(vitastor-disk ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/vitastor-dump-journal)
 install_symlink(vitastor-cli ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/vitastor-rm)
 install_symlink(vitastor-cli ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/vita)
 install(
--- a/src/base64.cpp
+++ b/src/base64.cpp
@@ -1,55 +0,0 @@
-// Copyright (c) Vitaliy Filippov, 2019+
-// License: VNPL-1.1 (see README.md for details)
-
-#include "base64.h"
-
-std::string base64_encode(const std::string &in)
-{
-    std::string out;
-    unsigned val = 0;
-    int valb = -6;
-    for (unsigned char c: in)
-    {
-        val = (val << 8) + c;
-        valb += 8;
-        while (valb >= 0)
-        {
-            out.push_back("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[(val>>valb) & 0x3F]);
-            valb -= 6;
-        }
-    }
-    if (valb > -6)
-        out.push_back("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[((val<<8)>>(valb+8)) & 0x3F]);
-    while (out.size() % 4)
-        out.push_back('=');
-    return out;
-}
-
-static char T[256] = { 0 };
-
-std::string base64_decode(const std::string &in)
-{
-    std::string out;
-    if (T[0] == 0)
-    {
-        for (int i = 0; i < 256; i++)
-            T[i] = -1;
-        for (int i = 0; i < 64; i++)
-            T[(unsigned char)("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[i])] = i;
-    }
-    unsigned val = 0;
-    int valb = -8;
-    for (unsigned char c: in)
-    {
-        if (T[c] == -1)
-            break;
-        val = (val<<6) + T[c];
-        valb += 6;
-        if (valb >= 0)
-        {
-            out.push_back(char((val >> valb) & 0xFF));
-            valb -= 8;
-        }
-    }
-    return out;
-}
--- a/src/base64.h
+++ b/src/base64.h
@@ -1,8 +0,0 @@
-// Copyright (c) Vitaliy Filippov, 2019+
-// License: VNPL-1.1 (see README.md for details)
-
-#pragma once
-#include <string>
-
-std::string base64_encode(const std::string &in);
-std::string base64_decode(const std::string &in);
--- a/src/blockstore.h
+++ b/src/blockstore.h
@@ -11,7 +11,6 @@

 #include <string>
 #include <map>
-#include <unordered_map>
 #include <functional>

 #include "object_id.h"
@@ -19,15 +18,19 @@
 #include "timerfd_manager.h"

 // Memory alignment for direct I/O (usually 512 bytes)
-// All other alignments must be a multiple of this one
+#ifndef DIRECT_IO_ALIGNMENT
+#define DIRECT_IO_ALIGNMENT 512
+#endif
+
+// Memory allocation alignment (page size is usually optimal)
 #ifndef MEM_ALIGNMENT
 #define MEM_ALIGNMENT 4096
 #endif

 // Default block size is 128 KB, current allowed range is 4K - 128M
-#define DEFAULT_ORDER 17
-#define MIN_BLOCK_SIZE 4*1024
-#define MAX_BLOCK_SIZE 128*1024*1024
+#define DEFAULT_DATA_BLOCK_ORDER 17
+#define MIN_DATA_BLOCK_SIZE 4*1024
+#define MAX_DATA_BLOCK_SIZE 128*1024*1024
 #define DEFAULT_BITMAP_GRANULARITY 4096

 #define BS_OP_MIN 1
@@ -151,7 +154,7 @@ struct blockstore_op_t
    uint8_t private_data[BS_OP_PRIVATE_DATA_SIZE];
 };

-typedef std::unordered_map<std::string, std::string> blockstore_config_t;
+typedef std::map<std::string, std::string> blockstore_config_t;

 class blockstore_impl_t;

@@ -189,7 +192,6 @@ public:
    // Print diagnostics to stdout
    void dump_diagnostics();

-    // FIXME rename to object_size
    uint32_t get_block_size();
    uint64_t get_block_count();
    uint64_t get_free_block_count();
--- a/src/blockstore_disk.cpp
+++ b/src/blockstore_disk.cpp
@@ -0,0 +1,323 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#include <sys/file.h>
+
+#include <stdexcept>
+
+#include "blockstore_impl.h"
+#include "blockstore_disk.h"
+#include "str_util.h"
+
+static uint32_t is_power_of_two(uint64_t value)
+{
+    uint32_t l = 0;
+    while (value > 1)
+    {
+        if (value & 1)
+        {
+            return 64;
+        }
+        value = value >> 1;
+        l++;
+    }
+    return l;
+}
+
+void blockstore_disk_t::parse_config(std::map<std::string, std::string> & config)
+{
+    // Parse
+    if (config["disable_device_lock"] == "true" || config["disable_device_lock"] == "1" || config["disable_device_lock"] == "yes")
+    {
+        disable_flock = true;
+    }
+    cfg_journal_size = parse_size(config["journal_size"]);
+    data_device = config["data_device"];
+    data_offset = parse_size(config["data_offset"]);
+    cfg_data_size = parse_size(config["data_size"]);
+    meta_device = config["meta_device"];
+    meta_offset = parse_size(config["meta_offset"]);
+    data_block_size = parse_size(config["block_size"]);
+    journal_device = config["journal_device"];
+    journal_offset = parse_size(config["journal_offset"]);
+    disk_alignment = strtoull(config["disk_alignment"].c_str(), NULL, 10);
+    journal_block_size = strtoull(config["journal_block_size"].c_str(), NULL, 10);
+    meta_block_size = strtoull(config["meta_block_size"].c_str(), NULL, 10);
+    bitmap_granularity = strtoull(config["bitmap_granularity"].c_str(), NULL, 10);
+    // Validate
+    if (!data_block_size)
+    {
+        data_block_size = (1 << DEFAULT_DATA_BLOCK_ORDER);
+    }
+    if ((block_order = is_power_of_two(data_block_size)) >= 64 || data_block_size < MIN_DATA_BLOCK_SIZE || data_block_size >= MAX_DATA_BLOCK_SIZE)
+    {
+        throw std::runtime_error("Bad block size");
+    }
+    if (!disk_alignment)
+    {
+        disk_alignment = 4096;
+    }
+    else if (disk_alignment % DIRECT_IO_ALIGNMENT)
+    {
+        throw std::runtime_error("disk_alignment must be a multiple of "+std::to_string(DIRECT_IO_ALIGNMENT));
+    }
+    if (!journal_block_size)
+    {
+        journal_block_size = 4096;
+    }
+    else if (journal_block_size % DIRECT_IO_ALIGNMENT)
+    {
+        throw std::runtime_error("journal_block_size must be a multiple of "+std::to_string(DIRECT_IO_ALIGNMENT));
+    }
+    if (!meta_block_size)
+    {
+        meta_block_size = 4096;
+    }
+    else if (meta_block_size % DIRECT_IO_ALIGNMENT)
+    {
+        throw std::runtime_error("meta_block_size must be a multiple of "+std::to_string(DIRECT_IO_ALIGNMENT));
+    }
+    if (data_offset % disk_alignment)
+    {
+        throw std::runtime_error("data_offset must be a multiple of disk_alignment = "+std::to_string(disk_alignment));
+    }
+    if (!bitmap_granularity)
+    {
+        bitmap_granularity = DEFAULT_BITMAP_GRANULARITY;
+    }
+    else if (bitmap_granularity % disk_alignment)
+    {
+        throw std::runtime_error("Sparse write tracking granularity must be a multiple of disk_alignment = "+std::to_string(disk_alignment));
+    }
+    if (data_block_size % bitmap_granularity)
+    {
+        throw std::runtime_error("Block size must be a multiple of sparse write tracking granularity");
+    }
+    if (meta_device == "")
+    {
+        meta_device = data_device;
+    }
+    if (journal_device == "")
+    {
+        journal_device = meta_device;
+    }
+    if (meta_offset % meta_block_size)
+    {
+        throw std::runtime_error("meta_offset must be a multiple of meta_block_size = "+std::to_string(meta_block_size));
+    }
+    if (journal_offset % journal_block_size)
+    {
+        throw std::runtime_error("journal_offset must be a multiple of journal_block_size = "+std::to_string(journal_block_size));
+    }
+    clean_entry_bitmap_size = data_block_size / bitmap_granularity / 8;
+    clean_entry_size = sizeof(clean_disk_entry) + 2*clean_entry_bitmap_size;
+}
+
+void blockstore_disk_t::calc_lengths(bool skip_meta_check)
+{
+    // data
+    data_len = data_device_size - data_offset;
+    if (data_fd == meta_fd && data_offset < meta_offset)
+    {
+        data_len = meta_offset - data_offset;
+    }
+    if (data_fd == journal_fd && data_offset < journal_offset)
+    {
+        data_len = data_len < journal_offset-data_offset
+            ? data_len : journal_offset-data_offset;
+    }
+    if (cfg_data_size != 0)
+    {
+        if (data_len < cfg_data_size)
+        {
+            throw std::runtime_error("Data area ("+std::to_string(data_len)+
+                " bytes) is smaller than configured size ("+std::to_string(cfg_data_size)+" bytes)");
+        }
+        data_len = cfg_data_size;
+    }
+    // meta
+    uint64_t meta_area_size = (meta_fd == data_fd ? data_device_size : meta_device_size) - meta_offset;
+    if (meta_fd == data_fd && meta_offset <= data_offset)
+    {
+        meta_area_size = data_offset - meta_offset;
+    }
+    if (meta_fd == journal_fd && meta_offset <= journal_offset)
+    {
+        meta_area_size = meta_area_size < journal_offset-meta_offset
+            ? meta_area_size : journal_offset-meta_offset;
+    }
+    // journal
+    journal_len = (journal_fd == data_fd ? data_device_size : (journal_fd == meta_fd ? meta_device_size : journal_device_size)) - journal_offset;
+    if (journal_fd == data_fd && journal_offset <= data_offset)
+    {
+        journal_len = data_offset - journal_offset;
+    }
+    if (journal_fd == meta_fd && journal_offset <= meta_offset)
+    {
+        journal_len = journal_len < meta_offset-journal_offset
+            ? journal_len : meta_offset-journal_offset;
+    }
+    // required metadata size
+    block_count = data_len / data_block_size;
+    meta_len = (1 + (block_count - 1 + meta_block_size / clean_entry_size) / (meta_block_size / clean_entry_size)) * meta_block_size;
+    if (!skip_meta_check && meta_area_size < meta_len)
+    {
+        throw std::runtime_error("Metadata area is too small, need at least "+std::to_string(meta_len)+" bytes");
+    }
+    // requested journal size
+    if (!skip_meta_check && cfg_journal_size > journal_len)
+    {
+        throw std::runtime_error("Requested journal_size is too large");
+    }
+    else if (cfg_journal_size > 0)
+    {
+        journal_len = cfg_journal_size;
+    }
+    if (journal_len < MIN_JOURNAL_SIZE)
+    {
+        throw std::runtime_error("Journal is too small, need at least "+std::to_string(MIN_JOURNAL_SIZE)+" bytes");
+    }
+}
+
+// FIXME: Move to utils
+static void check_size(int fd, uint64_t *size, uint64_t *sectsize, std::string name)
+{
+    int sect;
+    struct stat st;
+    if (fstat(fd, &st) < 0)
+    {
+        throw std::runtime_error("Failed to stat "+name);
+    }
+    if (S_ISREG(st.st_mode))
+    {
+        *size = st.st_size;
+        if (sectsize)
+        {
+            *sectsize = st.st_blksize;
+        }
+    }
+    else if (S_ISBLK(st.st_mode))
+    {
+        if (ioctl(fd, BLKGETSIZE64, size) < 0 ||
+            ioctl(fd, BLKSSZGET, &sect) < 0)
+        {
+            throw std::runtime_error("Failed to get "+name+" size or block size: "+strerror(errno));
+        }
+        if (sectsize)
+        {
+            *sectsize = sect;
+        }
+    }
+    else
+    {
+        throw std::runtime_error(name+" is neither a file nor a block device");
+    }
+}
+
+void blockstore_disk_t::open_data()
+{
+    data_fd = open(data_device.c_str(), O_DIRECT|O_RDWR);
+    if (data_fd == -1)
+    {
+        throw std::runtime_error("Failed to open data device "+data_device+": "+std::string(strerror(errno)));
+    }
+    check_size(data_fd, &data_device_size, &data_device_sect, "data device");
+    if (disk_alignment % data_device_sect)
+    {
+        throw std::runtime_error(
+            "disk_alignment ("+std::to_string(disk_alignment)+
+            ") is not a multiple of data device sector size ("+std::to_string(data_device_sect)+")"
+        );
+    }
+    if (data_offset >= data_device_size)
+    {
+        throw std::runtime_error("data_offset exceeds device size = "+std::to_string(data_device_size));
+    }
+    if (!disable_flock && flock(data_fd, LOCK_EX|LOCK_NB) != 0)
+    {
+        throw std::runtime_error(std::string("Failed to lock data device: ") + strerror(errno));
+    }
+}
+
+void blockstore_disk_t::open_meta()
+{
+    if (meta_device != data_device)
+    {
+        meta_fd = open(meta_device.c_str(), O_DIRECT|O_RDWR);
+        if (meta_fd == -1)
+        {
+            throw std::runtime_error("Failed to open metadata device "+meta_device+": "+std::string(strerror(errno)));
+        }
+        check_size(meta_fd, &meta_device_size, &meta_device_sect, "metadata device");
+        if (meta_offset >= meta_device_size)
+        {
+            throw std::runtime_error("meta_offset exceeds device size = "+std::to_string(meta_device_size));
+        }
+        if (!disable_flock && flock(meta_fd, LOCK_EX|LOCK_NB) != 0)
+        {
+            throw std::runtime_error(std::string("Failed to lock metadata device: ") + strerror(errno));
+        }
+    }
+    else
+    {
+        meta_fd = data_fd;
+        meta_device_sect = data_device_sect;
+        meta_device_size = 0;
+        if (meta_offset >= data_device_size)
+        {
+            throw std::runtime_error("meta_offset exceeds device size = "+std::to_string(data_device_size));
+        }
+    }
+    if (meta_block_size % meta_device_sect)
+    {
+        throw std::runtime_error(
+            "meta_block_size ("+std::to_string(meta_block_size)+
+            ") is not a multiple of data device sector size ("+std::to_string(meta_device_sect)+")"
+        );
+    }
+}
+
+void blockstore_disk_t::open_journal()
+{
+    if (journal_device != meta_device)
+    {
+        journal_fd = open(journal_device.c_str(), O_DIRECT|O_RDWR);
+        if (journal_fd == -1)
+        {
+            throw std::runtime_error("Failed to open journal device "+journal_device+": "+std::string(strerror(errno)));
+        }
+        check_size(journal_fd, &journal_device_size, &journal_device_sect, "journal device");
+        if (!disable_flock && flock(journal_fd, LOCK_EX|LOCK_NB) != 0)
+        {
+            throw std::runtime_error(std::string("Failed to lock journal device: ") + strerror(errno));
+        }
+    }
+    else
+    {
+        journal_fd = meta_fd;
+        journal_device_sect = meta_device_sect;
+        journal_device_size = 0;
+        if (journal_offset >= data_device_size)
+        {
+            throw std::runtime_error("journal_offset exceeds device size");
+        }
+    }
+    if (journal_block_size % journal_device_sect)
+    {
+        throw std::runtime_error(
+            "journal_block_size ("+std::to_string(journal_block_size)+
+            ") is not a multiple of journal device sector size ("+std::to_string(journal_device_sect)+")"
+        );
+    }
+}
+
+void blockstore_disk_t::close_all()
+{
+    if (data_fd >= 0)
+        close(data_fd);
+    if (meta_fd >= 0 && meta_fd != data_fd)
+        close(meta_fd);
+    if (journal_fd >= 0 && journal_fd != meta_fd)
+        close(journal_fd);
+    data_fd = meta_fd = journal_fd = -1;
+}
--- a/src/blockstore_disk.h
+++ b/src/blockstore_disk.h
@@ -0,0 +1,42 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#pragma once
+
+#include <stdint.h>
+
+#include <string>
+#include <map>
+
+struct blockstore_disk_t
+{
+    std::string data_device, meta_device, journal_device;
+    uint32_t data_block_size;
+    uint64_t cfg_journal_size, cfg_data_size;
+    // Required write alignment and journal/metadata/data areas' location alignment
+    uint32_t disk_alignment = 4096;
+    // Journal block size - minimum_io_size of the journal device is the best choice
+    uint64_t journal_block_size = 4096;
+    // Metadata block size - minimum_io_size of the metadata device is the best choice
+    uint64_t meta_block_size = 4096;
+    // Sparse write tracking granularity. 4 KB is a good choice. Must be a multiple of disk_alignment
+    uint64_t bitmap_granularity = 4096;
+    // By default, Blockstore locks all opened devices exclusively. This option can be used to disable locking
+    bool disable_flock = false;
+
+    int meta_fd = -1, data_fd = -1, journal_fd = -1;
+    uint64_t meta_offset, meta_device_sect, meta_device_size, meta_len;
+    uint64_t data_offset, data_device_sect, data_device_size, data_len;
+    uint64_t journal_offset, journal_device_sect, journal_device_size, journal_len;
+
+    uint32_t block_order;
+    uint64_t block_count;
+    uint32_t clean_entry_bitmap_size = 0, clean_entry_size = 0;
+
+    void parse_config(std::map<std::string, std::string> & config);
+    void open_data();
+    void open_meta();
+    void open_journal();
+    void calc_lengths(bool skip_meta_check = false);
+    void close_all();
+};
--- a/src/blockstore_flush.cpp
+++ b/src/blockstore_flush.cpp
@@ -15,11 +15,11 @@ journal_flusher_t::journal_flusher_t(blockstore_impl_t *bs)
    active_flushers = 0;
    syncing_flushers = 0;
    // FIXME: allow to configure flusher_start_threshold and journal_trim_interval
-    flusher_start_threshold = bs->journal_block_size / sizeof(journal_entry_stable);
+    flusher_start_threshold = bs->dsk.journal_block_size / sizeof(journal_entry_stable);
    journal_trim_interval = 512;
    journal_trim_counter = bs->journal.flush_journal ? 1 : 0;
    trim_wanted = bs->journal.flush_journal ? 1 : 0;
-    journal_superblock = bs->journal.inmemory ? bs->journal.buffer : memalign_or_die(MEM_ALIGNMENT, bs->journal_block_size);
+    journal_superblock = bs->journal.inmemory ? bs->journal.buffer : memalign_or_die(MEM_ALIGNMENT, bs->dsk.journal_block_size);
    co = new journal_flusher_co[max_flusher_count];
    for (int i = 0; i < max_flusher_count; i++)
    {
@@ -486,28 +486,28 @@ resume_1:
            bs->ringloop->wakeup();
        }
        // Reads completed, submit writes and set bitmap bits
-        if (bs->clean_entry_bitmap_size)
+        if (bs->dsk.clean_entry_bitmap_size)
        {
            new_clean_bitmap = (bs->inmemory_meta
-                ? (uint8_t*)meta_new.buf + meta_new.pos*bs->clean_entry_size + sizeof(clean_disk_entry)
-                : (uint8_t*)bs->clean_bitmap + (clean_loc >> bs->block_order)*(2*bs->clean_entry_bitmap_size));
+                ? (uint8_t*)meta_new.buf + meta_new.pos*bs->dsk.clean_entry_size + sizeof(clean_disk_entry)
+                : (uint8_t*)bs->clean_bitmap + (clean_loc >> bs->dsk.block_order)*(2*bs->dsk.clean_entry_bitmap_size));
            if (clean_init_bitmap)
            {
-                memset(new_clean_bitmap, 0, bs->clean_entry_bitmap_size);
-                bitmap_set(new_clean_bitmap, clean_bitmap_offset, clean_bitmap_len, bs->bitmap_granularity);
+                memset(new_clean_bitmap, 0, bs->dsk.clean_entry_bitmap_size);
+                bitmap_set(new_clean_bitmap, clean_bitmap_offset, clean_bitmap_len, bs->dsk.bitmap_granularity);
            }
        }
        for (it = v.begin(); it != v.end(); it++)
        {
            if (new_clean_bitmap)
            {
-                bitmap_set(new_clean_bitmap, it->offset, it->len, bs->bitmap_granularity);
+                bitmap_set(new_clean_bitmap, it->offset, it->len, bs->dsk.bitmap_granularity);
            }
            await_sqe(4);
            data->iov = (struct iovec){ it->buf, (size_t)it->len };
            data->callback = simple_callback_w;
            my_uring_prep_writev(
-                sqe, bs->data_fd, &data->iov, 1, bs->data_offset + clean_loc + it->offset
+                sqe, bs->dsk.data_fd, &data->iov, 1, bs->dsk.data_offset + clean_loc + it->offset
            );
            wait_count++;
        }
@@ -536,35 +536,35 @@ resume_1:
                return false;
            }
            // zero out old metadata entry
-            memset((uint8_t*)meta_old.buf + meta_old.pos*bs->clean_entry_size, 0, bs->clean_entry_size);
+            memset((uint8_t*)meta_old.buf + meta_old.pos*bs->dsk.clean_entry_size, 0, bs->dsk.clean_entry_size);
            await_sqe(15);
-            data->iov = (struct iovec){ meta_old.buf, bs->meta_block_size };
+            data->iov = (struct iovec){ meta_old.buf, bs->dsk.meta_block_size };
            data->callback = simple_callback_w;
            my_uring_prep_writev(
-                sqe, bs->meta_fd, &data->iov, 1, bs->meta_offset + meta_old.sector
+                sqe, bs->dsk.meta_fd, &data->iov, 1, bs->dsk.meta_offset + bs->dsk.meta_block_size + meta_old.sector
            );
            wait_count++;
        }
        if (has_delete)
        {
-            clean_disk_entry *new_entry = (clean_disk_entry*)((uint8_t*)meta_new.buf + meta_new.pos*bs->clean_entry_size);
+            clean_disk_entry *new_entry = (clean_disk_entry*)((uint8_t*)meta_new.buf + meta_new.pos*bs->dsk.clean_entry_size);
            if (new_entry->oid.inode != 0 && new_entry->oid != cur.oid)
            {
                printf("Fatal error (metadata corruption or bug): tried to delete metadata entry %lu (%lx:%lx v%lu) while deleting %lx:%lx\n",
-                    clean_loc >> bs->block_order, new_entry->oid.inode, new_entry->oid.stripe,
+                    clean_loc >> bs->dsk.block_order, new_entry->oid.inode, new_entry->oid.stripe,
                    new_entry->version, cur.oid.inode, cur.oid.stripe);
                exit(1);
            }
            // zero out new metadata entry
-            memset((uint8_t*)meta_new.buf + meta_new.pos*bs->clean_entry_size, 0, bs->clean_entry_size);
+            memset((uint8_t*)meta_new.buf + meta_new.pos*bs->dsk.clean_entry_size, 0, bs->dsk.clean_entry_size);
        }
        else
        {
-            clean_disk_entry *new_entry = (clean_disk_entry*)((uint8_t*)meta_new.buf + meta_new.pos*bs->clean_entry_size);
+            clean_disk_entry *new_entry = (clean_disk_entry*)((uint8_t*)meta_new.buf + meta_new.pos*bs->dsk.clean_entry_size);
            if (new_entry->oid.inode != 0 && new_entry->oid != cur.oid)
            {
                printf("Fatal error (metadata corruption or bug): tried to overwrite non-zero metadata entry %lu (%lx:%lx v%lu) with %lx:%lx v%lu\n",
-                    clean_loc >> bs->block_order, new_entry->oid.inode, new_entry->oid.stripe, new_entry->version,
+                    clean_loc >> bs->dsk.block_order, new_entry->oid.inode, new_entry->oid.stripe, new_entry->version,
                    cur.oid.inode, cur.oid.stripe, cur.version);
                exit(1);
            }
@@ -572,20 +572,20 @@ resume_1:
            new_entry->version = cur.version;
            if (!bs->inmemory_meta)
            {
-                memcpy(&new_entry->bitmap, new_clean_bitmap, bs->clean_entry_bitmap_size);
+                memcpy(&new_entry->bitmap, new_clean_bitmap, bs->dsk.clean_entry_bitmap_size);
            }
            // copy latest external bitmap/attributes
-            if (bs->clean_entry_bitmap_size)
+            if (bs->dsk.clean_entry_bitmap_size)
            {
-                void *bmp_ptr = bs->clean_entry_bitmap_size > sizeof(void*) ? dirty_end->second.bitmap : &dirty_end->second.bitmap;
-                memcpy((uint8_t*)(new_entry+1) + bs->clean_entry_bitmap_size, bmp_ptr, bs->clean_entry_bitmap_size);
+                void *bmp_ptr = bs->dsk.clean_entry_bitmap_size > sizeof(void*) ? dirty_end->second.bitmap : &dirty_end->second.bitmap;
+                memcpy((uint8_t*)(new_entry+1) + bs->dsk.clean_entry_bitmap_size, bmp_ptr, bs->dsk.clean_entry_bitmap_size);
            }
        }
        await_sqe(6);
-        data->iov = (struct iovec){ meta_new.buf, bs->meta_block_size };
+        data->iov = (struct iovec){ meta_new.buf, bs->dsk.meta_block_size };
        data->callback = simple_callback_w;
        my_uring_prep_writev(
-            sqe, bs->meta_fd, &data->iov, 1, bs->meta_offset + meta_new.sector
+            sqe, bs->dsk.meta_fd, &data->iov, 1, bs->dsk.meta_offset + bs->dsk.meta_block_size + meta_new.sector
        );
        wait_count++;
    resume_7:
@@ -615,7 +615,11 @@ resume_1:
        }
        for (it = v.begin(); it != v.end(); it++)
        {
-            free(it->buf);
+            // Free it if it's not taken from the journal
+            if (it->buf && (!bs->journal.inmemory || it->buf < bs->journal.buffer || it->buf >= bs->journal.buffer + bs->journal.len))
+            {
+                free(it->buf);
+            }
        }
        v.clear();
        // And sync metadata (in batches - not per each operation!)
@@ -669,9 +673,9 @@ resume_1:
                    .version = JOURNAL_VERSION,
                };
                ((journal_entry_start*)flusher->journal_superblock)->crc32 = je_crc32((journal_entry*)flusher->journal_superblock);
-                data->iov = (struct iovec){ flusher->journal_superblock, bs->journal_block_size };
+                data->iov = (struct iovec){ flusher->journal_superblock, bs->dsk.journal_block_size };
                data->callback = simple_callback_w;
-                my_uring_prep_writev(sqe, bs->journal.fd, &data->iov, 1, bs->journal.offset);
+                my_uring_prep_writev(sqe, bs->dsk.journal_fd, &data->iov, 1, bs->journal.offset);
                wait_count++;
            resume_13:
                if (wait_count > 0)
@@ -682,7 +686,7 @@ resume_1:
                if (!bs->disable_journal_fsync)
                {
                    await_sqe(20);
-                    my_uring_prep_fsync(sqe, bs->journal.fd, IORING_FSYNC_DATASYNC);
+                    my_uring_prep_fsync(sqe, bs->dsk.journal_fd, IORING_FSYNC_DATASYNC);
                    data->iov = { 0 };
                    data->callback = simple_callback_w;
                resume_21:
@@ -760,21 +764,22 @@ bool journal_flusher_co::scan_dirty(int wait_base)
                    {
                        submit_offset = dirty_it->second.location + offset - dirty_it->second.offset;
                        submit_len = it == v.end() || it->offset >= end_offset ? end_offset-offset : it->offset-offset;
-                        it = v.insert(it, (copy_buffer_t){ .offset = offset, .len = submit_len, .buf = memalign_or_die(MEM_ALIGNMENT, submit_len) });
+                        it = v.insert(it, (copy_buffer_t){ .offset = offset, .len = submit_len });
                        copy_count++;
                        if (bs->journal.inmemory)
                        {
-                            // Take it from memory
-                            memcpy(it->buf, (uint8_t*)bs->journal.buffer + submit_offset, submit_len);
+                            // Take it from memory, don't copy it
+                            it->buf = (uint8_t*)bs->journal.buffer + submit_offset;
                        }
                        else
                        {
                            // Read it from disk
+                            it->buf = memalign_or_die(MEM_ALIGNMENT, submit_len);
                            await_sqe(0);
                            data->iov = (struct iovec){ it->buf, (size_t)submit_len };
                            data->callback = simple_callback_r;
                            my_uring_prep_readv(
-                                sqe, bs->journal.fd, &data->iov, 1, bs->journal.offset + submit_offset
+                                sqe, bs->dsk.journal_fd, &data->iov, 1, bs->journal.offset + submit_offset
                            );
                            wait_count++;
                        }
@@ -825,8 +830,8 @@ bool journal_flusher_co::modify_meta_read(uint64_t meta_loc, flusher_meta_write_
    // And yet another option is to use LSM trees for metadata, but it sophisticates everything a lot,
    // so I'll avoid it as long as I can.
    wr.submitted = false;
-    wr.sector = ((meta_loc >> bs->block_order) / (bs->meta_block_size / bs->clean_entry_size)) * bs->meta_block_size;
-    wr.pos = ((meta_loc >> bs->block_order) % (bs->meta_block_size / bs->clean_entry_size));
+    wr.sector = ((meta_loc >> bs->dsk.block_order) / (bs->dsk.meta_block_size / bs->dsk.clean_entry_size)) * bs->dsk.meta_block_size;
+    wr.pos = ((meta_loc >> bs->dsk.block_order) % (bs->dsk.meta_block_size / bs->dsk.clean_entry_size));
    if (bs->inmemory_meta)
    {
        wr.buf = (uint8_t*)bs->metadata_buffer + wr.sector;
@@ -836,20 +841,20 @@ bool journal_flusher_co::modify_meta_read(uint64_t meta_loc, flusher_meta_write_
    if (wr.it == flusher->meta_sectors.end())
    {
        // Not in memory yet, read it
-        wr.buf = memalign_or_die(MEM_ALIGNMENT, bs->meta_block_size);
+        wr.buf = memalign_or_die(MEM_ALIGNMENT, bs->dsk.meta_block_size);
        wr.it = flusher->meta_sectors.emplace(wr.sector, (meta_sector_t){
            .offset = wr.sector,
-            .len = bs->meta_block_size,
+            .len = bs->dsk.meta_block_size,
            .state = 0, // 0 = not read yet
            .buf = wr.buf,
            .usage_count = 1,
        }).first;
        await_sqe(0);
-        data->iov = (struct iovec){ wr.it->second.buf, bs->meta_block_size };
+        data->iov = (struct iovec){ wr.it->second.buf, bs->dsk.meta_block_size };
        data->callback = simple_callback_r;
        wr.submitted = true;
        my_uring_prep_readv(
-            sqe, bs->meta_fd, &data->iov, 1, bs->meta_offset + wr.sector
+            sqe, bs->dsk.meta_fd, &data->iov, 1, bs->dsk.meta_offset + bs->dsk.meta_block_size + wr.sector
        );
        wait_count++;
    }
@@ -867,11 +872,11 @@ void journal_flusher_co::update_clean_db()
    {
 #ifdef BLOCKSTORE_DEBUG
        printf("Free block %lu from %lx:%lx v%lu (new location is %lu)\n",
-            old_clean_loc >> bs->block_order,
+            old_clean_loc >> bs->dsk.block_order,
            cur.oid.inode, cur.oid.stripe, cur.version,
-            clean_loc >> bs->block_order);
+            clean_loc >> bs->dsk.block_order);
 #endif
-        bs->data_alloc->set(old_clean_loc >> bs->block_order, false);
+        bs->data_alloc->set(old_clean_loc >> bs->dsk.block_order, false);
    }
    auto & clean_db = bs->clean_db_shard(cur.oid);
    if (has_delete)
@@ -880,10 +885,10 @@ void journal_flusher_co::update_clean_db()
        clean_db.erase(clean_it);
 #ifdef BLOCKSTORE_DEBUG
        printf("Free block %lu from %lx:%lx v%lu (delete)\n",
-            clean_loc >> bs->block_order,
+            clean_loc >> bs->dsk.block_order,
            cur.oid.inode, cur.oid.stripe, cur.version);
 #endif
-        bs->data_alloc->set(clean_loc >> bs->block_order, false);
+        bs->data_alloc->set(clean_loc >> bs->dsk.block_order, false);
        clean_loc = UINT64_MAX;
    }
    else
@@ -932,7 +937,7 @@ bool journal_flusher_co::fsync_batch(bool fsync_meta, int wait_base)
                await_sqe(0);
                data->iov = { 0 };
                data->callback = simple_callback_w;
-                my_uring_prep_fsync(sqe, fsync_meta ? bs->meta_fd : bs->data_fd, IORING_FSYNC_DATASYNC);
+                my_uring_prep_fsync(sqe, fsync_meta ? bs->dsk.meta_fd : bs->dsk.data_fd, IORING_FSYNC_DATASYNC);
                cur_sync->state = 1;
                wait_count++;
            resume_2:
--- a/src/blockstore_impl.cpp
+++ b/src/blockstore_impl.cpp
@@ -11,25 +11,19 @@ blockstore_impl_t::blockstore_impl_t(blockstore_config_t & config, ring_loop_t *
    ring_consumer.loop = [this]() { loop(); };
    ringloop->register_consumer(&ring_consumer);
    initialized = 0;
-    data_fd = meta_fd = journal.fd = -1;
    parse_config(config);
-    zero_object = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, block_size);
+    zero_object = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, dsk.data_block_size);
    try
    {
-        open_data();
-        open_meta();
-        open_journal();
+        dsk.open_data();
+        dsk.open_meta();
+        dsk.open_journal();
        calc_lengths();
-        data_alloc = new allocator(block_count);
+        data_alloc = new allocator(dsk.block_count);
    }
    catch (std::exception & e)
    {
-        if (data_fd >= 0)
-            close(data_fd);
-        if (meta_fd >= 0 && meta_fd != data_fd)
-            close(meta_fd);
-        if (journal.fd >= 0 && journal.fd != meta_fd)
-            close(journal.fd);
+        dsk.close_all();
        throw;
    }
    flusher = new journal_flusher_t(this);
@@ -41,12 +35,7 @@ blockstore_impl_t::~blockstore_impl_t()
    delete flusher;
    free(zero_object);
    ringloop->unregister_consumer(&ring_consumer);
-    if (data_fd >= 0)
-        close(data_fd);
-    if (meta_fd >= 0 && meta_fd != data_fd)
-        close(meta_fd);
-    if (journal.fd >= 0 && journal.fd != meta_fd)
-        close(journal.fd);
+    dsk.close_all();
    if (metadata_buffer)
        free(metadata_buffer);
    if (clean_bitmap)
@@ -343,9 +332,9 @@ void blockstore_impl_t::enqueue_op(blockstore_op_t *op)
 {
    if (op->opcode < BS_OP_MIN || op->opcode > BS_OP_MAX ||
        ((op->opcode == BS_OP_READ || op->opcode == BS_OP_WRITE || op->opcode == BS_OP_WRITE_STABLE) && (
-            op->offset >= block_size ||
-            op->len > block_size-op->offset ||
-            (op->len % disk_alignment)
+            op->offset >= dsk.data_block_size ||
+            op->len > dsk.data_block_size-op->offset ||
+            (op->len % dsk.disk_alignment)
        )) ||
        readonly && op->opcode != BS_OP_READ && op->opcode != BS_OP_LIST)
    {
@@ -477,7 +466,7 @@ void blockstore_impl_t::process_list(blockstore_op_t *op)
    uint64_t min_inode = op->oid.inode;
    uint64_t max_inode = op->version;
    // Check PG
-    if (pg_count != 0 && (pg_stripe_size < MIN_BLOCK_SIZE || list_pg > pg_count))
+    if (pg_count != 0 && (pg_stripe_size < MIN_DATA_BLOCK_SIZE || list_pg > pg_count))
    {
        op->retval = -EINVAL;
        FINISH_OP(op);
--- a/src/blockstore_impl.h
+++ b/src/blockstore_impl.h
@@ -4,6 +4,7 @@
 #pragma once

 #include "blockstore.h"
+#include "blockstore_disk.h"

 #include <sys/types.h>
 #include <sys/ioctl.h>
@@ -17,6 +18,7 @@
 #include <list>
 #include <deque>
 #include <new>
+#include <unordered_map>

 #include "cpp-btree/btree_map.h"

@@ -90,13 +92,13 @@
 #include "blockstore_journal.h"

 // "VITAstor"
-#define BLOCKSTORE_META_MAGIC 0x726F747341544956l
-#define BLOCKSTORE_META_VERSION 1
+#define BLOCKSTORE_META_MAGIC_V1 0x726F747341544956l
+#define BLOCKSTORE_META_VERSION_V1 1

 // metadata header (superblock)
 // FIXME: After adding the OSD superblock, add a key to metadata
 // and journal headers to check if they belong to the same OSD
-struct __attribute__((__packed__)) blockstore_meta_header_t
+struct __attribute__((__packed__)) blockstore_meta_header_v1_t
 {
    uint64_t zero;
    uint64_t magic;
@@ -164,6 +166,7 @@ struct __attribute__((__packed__)) dirty_entry
 struct fulfill_read_t
 {
    uint64_t offset, len;
+    uint64_t journal_sector; // sector+1 if used and !journal.inmemory, otherwise 0
 };

 #define PRIV(op) ((blockstore_op_private_t*)(op)->private_data)
@@ -217,23 +220,10 @@ struct pool_shard_settings_t

 class blockstore_impl_t
 {
+    blockstore_disk_t dsk;
+
    /******* OPTIONS *******/
-    std::string data_device, meta_device, journal_device;
-    uint32_t block_size;
-    uint64_t meta_offset;
-    uint64_t data_offset;
-    uint64_t cfg_journal_size, cfg_data_size;
-    // Required write alignment and journal/metadata/data areas' location alignment
-    uint32_t disk_alignment = 4096;
-    // Journal block size - minimum_io_size of the journal device is the best choice
-    uint64_t journal_block_size = 4096;
-    // Metadata block size - minimum_io_size of the metadata device is the best choice
-    uint64_t meta_block_size = 4096;
-    // Sparse write tracking granularity. 4 KB is a good choice. Must be a multiple of disk_alignment
-    uint64_t bitmap_granularity = 4096;
    bool readonly = false;
-    // By default, Blockstore locks all opened devices exclusively. This option can be used to disable locking
-    bool disable_flock = false;
    // It is safe to disable fsync() if drive write cache is writethrough
    bool disable_data_fsync = false, disable_meta_fsync = false, disable_journal_fsync = false;
    // Enable if you want every operation to be executed with an "implicit fsync"
@@ -268,16 +258,6 @@ class blockstore_impl_t
    allocator *data_alloc = NULL;
    uint8_t *zero_object;

-    uint32_t block_order;
-    uint64_t block_count;
-    uint32_t clean_entry_bitmap_size = 0, clean_entry_size = 0;
-
-    int meta_fd;
-    int data_fd;
-    uint64_t meta_size, meta_area, meta_len;
-    uint64_t data_size, data_len;
-    uint64_t data_device_sect, meta_device_sect, journal_device_sect;
-
    void *metadata_buffer = NULL;

    struct journal_t journal;
@@ -326,7 +306,7 @@ class blockstore_impl_t
    // Read
    int dequeue_read(blockstore_op_t *read_op);
    int fulfill_read(blockstore_op_t *read_op, uint64_t &fulfilled, uint32_t item_start, uint32_t item_end,
-        uint32_t item_state, uint64_t item_version, uint64_t item_location);
+        uint32_t item_state, uint64_t item_version, uint64_t item_location, uint64_t journal_sector);
    int fulfill_read_push(blockstore_op_t *op, void *buf, uint64_t offset, uint64_t len,
        uint32_t item_state, uint64_t item_version);
    void handle_read_event(ring_data_t *data, blockstore_op_t *op);
@@ -394,9 +374,9 @@ public:
    // Print diagnostics to stdout
    void dump_diagnostics();

-    inline uint32_t get_block_size() { return block_size; }
-    inline uint64_t get_block_count() { return block_count; }
+    inline uint32_t get_block_size() { return dsk.data_block_size; }
+    inline uint64_t get_block_count() { return dsk.block_count; }
    inline uint64_t get_free_block_count() { return data_alloc->get_free_count(); }
-    inline uint32_t get_bitmap_granularity() { return disk_alignment; }
-    inline uint64_t get_journal_size() { return journal.len; }
+    inline uint32_t get_bitmap_granularity() { return dsk.disk_alignment; }
+    inline uint64_t get_journal_size() { return dsk.journal_len; }
 };
--- a/src/blockstore_init.cpp
+++ b/src/blockstore_init.cpp
@@ -3,6 +3,11 @@

 #include "blockstore_impl.h"

+#define INIT_META_EMPTY 0
+#define INIT_META_READING 1
+#define INIT_META_READ_DONE 2
+#define INIT_META_WRITING 3
+
 #define GET_SQE() \
    sqe = bs->get_sqe();\
    if (!sqe)\
@@ -22,20 +27,23 @@ blockstore_init_meta::blockstore_init_meta(blockstore_impl_t *bs)
    this->bs = bs;
 }

-void blockstore_init_meta::handle_event(ring_data_t *data)
+void blockstore_init_meta::handle_event(ring_data_t *data, int buf_num)
 {
    if (data->res < 0)
    {
        throw std::runtime_error(
-            std::string("read metadata failed at offset ") + std::to_string(metadata_read) +
+            std::string("read metadata failed at offset ") + std::to_string(bufs[buf_num].offset) +
            std::string(": ") + strerror(-data->res)
        );
    }
-    prev_done = data->res > 0 ? submitted : 0;
-    done_len = data->res;
-    done_pos = metadata_read;
-    metadata_read += data->res;
-    submitted = 0;
+    if (buf_num >= 0)
+    {
+        bufs[buf_num].state = (bufs[buf_num].state == INIT_META_READING
+            ? INIT_META_READ_DONE
+            : INIT_META_EMPTY);
+    }
+    submitted--;
+    bs->ringloop->wakeup();
 }

 int blockstore_init_meta::loop()
@@ -57,27 +65,27 @@ int blockstore_init_meta::loop()
        throw std::runtime_error("Failed to allocate metadata read buffer");
    // Read superblock
    GET_SQE();
-    data->iov = { metadata_buffer, bs->meta_block_size };
-    data->callback = [this](ring_data_t *data) { handle_event(data); };
-    my_uring_prep_readv(sqe, bs->meta_fd, &data->iov, 1, bs->meta_offset);
+    data->iov = { metadata_buffer, bs->dsk.meta_block_size };
+    data->callback = [this](ring_data_t *data) { handle_event(data, -1); };
+    my_uring_prep_readv(sqe, bs->dsk.meta_fd, &data->iov, 1, bs->dsk.meta_offset);
    bs->ringloop->submit();
-    submitted = 1;
+    submitted++;
 resume_1:
-    if (submitted)
+    if (submitted > 0)
    {
        wait_state = 1;
        return 1;
    }
-    if (iszero((uint64_t*)metadata_buffer, bs->meta_block_size / sizeof(uint64_t)))
+    if (iszero((uint64_t*)metadata_buffer, bs->dsk.meta_block_size / sizeof(uint64_t)))
    {
        {
-            blockstore_meta_header_t *hdr = (blockstore_meta_header_t *)metadata_buffer;
+            blockstore_meta_header_v1_t *hdr = (blockstore_meta_header_v1_t *)metadata_buffer;
            hdr->zero = 0;
-            hdr->magic = BLOCKSTORE_META_MAGIC;
-            hdr->version = BLOCKSTORE_META_VERSION;
-            hdr->meta_block_size = bs->meta_block_size;
-            hdr->data_block_size = bs->block_size;
-            hdr->bitmap_granularity = bs->bitmap_granularity;
+            hdr->magic = BLOCKSTORE_META_MAGIC_V1;
+            hdr->version = BLOCKSTORE_META_VERSION_V1;
+            hdr->meta_block_size = bs->dsk.meta_block_size;
+            hdr->data_block_size = bs->dsk.data_block_size;
+            hdr->bitmap_granularity = bs->dsk.bitmap_granularity;
        }
        if (bs->readonly)
        {
@@ -87,11 +95,11 @@ resume_1:
        {
            printf("Initializing metadata area\n");
            GET_SQE();
-            data->iov = (struct iovec){ metadata_buffer, bs->meta_block_size };
-            data->callback = [this](ring_data_t *data) { handle_event(data); };
-            my_uring_prep_writev(sqe, bs->meta_fd, &data->iov, 1, bs->meta_offset);
+            data->iov = (struct iovec){ metadata_buffer, bs->dsk.meta_block_size };
+            data->callback = [this](ring_data_t *data) { handle_event(data, -1); };
+            my_uring_prep_writev(sqe, bs->dsk.meta_fd, &data->iov, 1, bs->dsk.meta_offset);
            bs->ringloop->submit();
-            submitted = 1;
+            submitted++;
        resume_3:
            if (submitted > 0)
            {
@@ -103,10 +111,10 @@ resume_1:
    }
    else
    {
-        blockstore_meta_header_t *hdr = (blockstore_meta_header_t *)metadata_buffer;
+        blockstore_meta_header_v1_t *hdr = (blockstore_meta_header_v1_t *)metadata_buffer;
        if (hdr->zero != 0 ||
-            hdr->magic != BLOCKSTORE_META_MAGIC ||
-            hdr->version != BLOCKSTORE_META_VERSION)
+            hdr->magic != BLOCKSTORE_META_MAGIC_V1 ||
+            hdr->version != BLOCKSTORE_META_VERSION_V1)
        {
            printf(
                "Metadata is corrupt or old version.\n"
@@ -115,80 +123,96 @@ resume_1:
            );
            exit(1);
        }
-        if (hdr->meta_block_size != bs->meta_block_size ||
-            hdr->data_block_size != bs->block_size ||
-            hdr->bitmap_granularity != bs->bitmap_granularity)
+        if (hdr->meta_block_size != bs->dsk.meta_block_size ||
+            hdr->data_block_size != bs->dsk.data_block_size ||
+            hdr->bitmap_granularity != bs->dsk.bitmap_granularity)
        {
            printf(
                "Configuration stored in metadata superblock"
                " (meta_block_size=%u, data_block_size=%u, bitmap_granularity=%u)"
                " differs from OSD configuration (%lu/%u/%lu).\n",
                hdr->meta_block_size, hdr->data_block_size, hdr->bitmap_granularity,
-                bs->meta_block_size, bs->block_size, bs->bitmap_granularity
+                bs->dsk.meta_block_size, bs->dsk.data_block_size, bs->dsk.bitmap_granularity
            );
            exit(1);
        }
    }
    // Skip superblock
-    bs->meta_offset += bs->meta_block_size;
-    bs->meta_len -= bs->meta_block_size;
-    prev_done = 0;
-    done_len = 0;
-    done_pos = 0;
-    metadata_read = 0;
+    md_offset = bs->dsk.meta_block_size;
+    next_offset = md_offset;
    // Read the rest of the metadata
-    while (1)
+resume_2:
+    if (next_offset < bs->dsk.meta_len && submitted == 0)
    {
-    resume_2:
-        if (submitted)
+        // Submit one read
+        for (int i = 0; i < 2; i++)
        {
-            wait_state = 2;
-            return 1;
-        }
-        if (metadata_read < bs->meta_len)
-        {
-            GET_SQE();
-            data->iov = {
-                (uint8_t*)metadata_buffer + (bs->inmemory_meta
-                    ? metadata_read
-                    : (prev == 1 ? bs->metadata_buf_size : 0)),
-                bs->meta_len - metadata_read > bs->metadata_buf_size ? bs->metadata_buf_size : bs->meta_len - metadata_read,
-            };
-            data->callback = [this](ring_data_t *data) { handle_event(data); };
-            if (!zero_on_init)
-                my_uring_prep_readv(sqe, bs->meta_fd, &data->iov, 1, bs->meta_offset + metadata_read);
-            else
+            if (!bufs[i].state)
            {
-                // Fill metadata with zeroes
-                memset(data->iov.iov_base, 0, data->iov.iov_len);
-                my_uring_prep_writev(sqe, bs->meta_fd, &data->iov, 1, bs->meta_offset + metadata_read);
+                bufs[i].buf = (uint8_t*)metadata_buffer + (bs->inmemory_meta
+                    ? next_offset-md_offset
+                    : i*bs->metadata_buf_size);
+                bufs[i].offset = next_offset;
+                bufs[i].size = bs->dsk.meta_len-next_offset > bs->metadata_buf_size
+                    ? bs->metadata_buf_size : bs->dsk.meta_len-next_offset;
+                bufs[i].state = INIT_META_READING;
+                submitted++;
+                next_offset += bufs[i].size;
+                GET_SQE();
+                data->iov = { bufs[i].buf, bufs[i].size };
+                data->callback = [this, i](ring_data_t *data) { handle_event(data, i); };
+                if (!zero_on_init)
+                    my_uring_prep_readv(sqe, bs->dsk.meta_fd, &data->iov, 1, bs->dsk.meta_offset + bufs[i].offset);
+                else
+                {
+                    // Fill metadata with zeroes
+                    memset(data->iov.iov_base, 0, data->iov.iov_len);
+                    my_uring_prep_writev(sqe, bs->dsk.meta_fd, &data->iov, 1, bs->dsk.meta_offset + bufs[i].offset);
+                }
+                bs->ringloop->submit();
+                break;
            }
-            bs->ringloop->submit();
-            submitted = (prev == 1 ? 2 : 1);
-            prev = submitted;
-        }
-        if (prev_done)
-        {
-            void *done_buf = bs->inmemory_meta
-                ? ((uint8_t*)metadata_buffer + done_pos)
-                : ((uint8_t*)metadata_buffer + (prev_done == 2 ? bs->metadata_buf_size : 0));
-            unsigned count = bs->meta_block_size / bs->clean_entry_size;
-            for (int sector = 0; sector < done_len; sector += bs->meta_block_size)
-            {
-                // handle <count> entries
-                handle_entries((uint8_t*)done_buf + sector, count, bs->block_order);
-                done_cnt += count;
-            }
-            prev_done = 0;
-            done_len = 0;
-        }
-        if (!submitted)
-        {
-            break;
        }
    }
+    for (int i = 0; i < 2; i++)
+    {
+        if (bufs[i].state == INIT_META_READ_DONE)
+        {
+            // Handle result
+            unsigned entries_per_block = bs->dsk.meta_block_size / bs->dsk.clean_entry_size;
+            bool changed = false;
+            for (uint64_t sector = 0; sector < bufs[i].size; sector += bs->dsk.meta_block_size)
+            {
+                // handle <count> entries
+                changed = changed || handle_entries(
+                    bufs[i].buf + sector, entries_per_block,
+                    ((bufs[i].offset + sector - md_offset) / bs->dsk.meta_block_size) * entries_per_block
+                );
+            }
+            if (changed && !bs->inmemory_meta)
+            {
+                // write the modified buffer back
+                GET_SQE();
+                data->iov = { bufs[i].buf, bufs[i].size };
+                data->callback = [this, i](ring_data_t *data) { handle_event(data, i); };
+                my_uring_prep_writev(sqe, bs->dsk.meta_fd, &data->iov, 1, bs->dsk.meta_offset + bufs[i].offset);
+                bufs[i].state = INIT_META_WRITING;
+                submitted++;
+            }
+            else
+            {
+                bufs[i].state = 0;
+            }
+            bs->ringloop->wakeup();
+        }
+    }
+    if (submitted > 0)
+    {
+        wait_state = 2;
+        return 1;
+    }
    // metadata read finished
-    printf("Metadata entries loaded: %lu, free blocks: %lu / %lu\n", entries_loaded, bs->data_alloc->get_free_count(), bs->block_count);
+    printf("Metadata entries loaded: %lu, free blocks: %lu / %lu\n", entries_loaded, bs->data_alloc->get_free_count(), bs->dsk.block_count);
    if (!bs->inmemory_meta)
    {
        free(metadata_buffer);
@@ -197,10 +221,10 @@ resume_1:
    if (zero_on_init && !bs->disable_meta_fsync)
    {
        GET_SQE();
-        my_uring_prep_fsync(sqe, bs->meta_fd, IORING_FSYNC_DATASYNC);
+        my_uring_prep_fsync(sqe, bs->dsk.meta_fd, IORING_FSYNC_DATASYNC);
        data->iov = { 0 };
-        data->callback = [this](ring_data_t *data) { handle_event(data); };
-        submitted = 1;
+        data->callback = [this](ring_data_t *data) { handle_event(data, -1); };
+        submitted++;
        bs->ringloop->submit();
    resume_4:
        if (submitted > 0)
@@ -212,14 +236,15 @@ resume_1:
    return 0;
 }

-void blockstore_init_meta::handle_entries(void* entries, unsigned count, int block_order)
+bool blockstore_init_meta::handle_entries(uint8_t *buf, uint64_t count, uint64_t done_cnt)
 {
-    for (unsigned i = 0; i < count; i++)
+    bool updated = false;
+    for (uint64_t i = 0; i < count; i++)
    {
-        clean_disk_entry *entry = (clean_disk_entry*)((uint8_t*)entries + i*bs->clean_entry_size);
-        if (!bs->inmemory_meta && bs->clean_entry_bitmap_size)
+        clean_disk_entry *entry = (clean_disk_entry*)(buf + i*bs->dsk.clean_entry_size);
+        if (!bs->inmemory_meta && bs->dsk.clean_entry_bitmap_size)
        {
-            memcpy(bs->clean_bitmap + (done_cnt+i)*2*bs->clean_entry_bitmap_size, &entry->bitmap, 2*bs->clean_entry_bitmap_size);
+            memcpy(bs->clean_bitmap + (done_cnt+i)*2*bs->dsk.clean_entry_bitmap_size, &entry->bitmap, 2*bs->dsk.clean_entry_bitmap_size);
        }
        if (entry->oid.inode > 0)
        {
@@ -230,17 +255,21 @@ void blockstore_init_meta::handle_entries(void* entries, unsigned count, int blo
                if (clean_it != clean_db.end())
                {
                    // free the previous block
+                    // here we have to zero out the entry because otherwise we'll hit
+                    // "tried to overwrite non-zero metadata entry" later
+                    updated = true;
+                    memset(entry, 0, bs->dsk.clean_entry_size);
 #ifdef BLOCKSTORE_DEBUG
                    printf("Free block %lu from %lx:%lx v%lu (new location is %lu)\n",
-                        clean_it->second.location >> block_order,
+                        clean_it->second.location >> bs->dsk.block_order,
                        clean_it->first.inode, clean_it->first.stripe, clean_it->second.version,
                        done_cnt+i);
 #endif
-                    bs->data_alloc->set(clean_it->second.location >> block_order, false);
+                    bs->data_alloc->set(clean_it->second.location >> bs->dsk.block_order, false);
                }
                else
                {
-                    bs->inode_space_stats[entry->oid.inode] += bs->block_size;
+                    bs->inode_space_stats[entry->oid.inode] += bs->dsk.data_block_size;
                }
                entries_loaded++;
 #ifdef BLOCKSTORE_DEBUG
@@ -249,17 +278,21 @@ void blockstore_init_meta::handle_entries(void* entries, unsigned count, int blo
                bs->data_alloc->set(done_cnt+i, true);
                clean_db[entry->oid] = (struct clean_entry){
                    .version = entry->version,
-                    .location = (done_cnt+i) << block_order,
+                    .location = (done_cnt+i) << bs->dsk.block_order,
                };
            }
            else
            {
+                // here we also have to zero out the entry
+                updated = true;
+                memset(entry, 0, bs->dsk.clean_entry_size);
 #ifdef BLOCKSTORE_DEBUG
                printf("Old clean entry %lu: %lx:%lx v%lu\n", done_cnt+i, entry->oid.inode, entry->oid.stripe, entry->version);
 #endif
            }
        }
    }
+    return updated;
 }

 blockstore_init_journal::blockstore_init_journal(blockstore_impl_t *bs)
@@ -328,7 +361,7 @@ int blockstore_init_journal::loop()
    data = ((ring_data_t*)sqe->user_data);
    data->iov = { submitted_buf, bs->journal.block_size };
    data->callback = simple_callback;
-    my_uring_prep_readv(sqe, bs->journal.fd, &data->iov, 1, bs->journal.offset);
+    my_uring_prep_readv(sqe, bs->dsk.journal_fd, &data->iov, 1, bs->journal.offset);
    bs->ringloop->submit();
    wait_count = 1;
 resume_1:
@@ -367,7 +400,7 @@ resume_1:
            GET_SQE();
            data->iov = (struct iovec){ submitted_buf, 2*bs->journal.block_size };
            data->callback = simple_callback;
-            my_uring_prep_writev(sqe, bs->journal.fd, &data->iov, 1, bs->journal.offset);
+            my_uring_prep_writev(sqe, bs->dsk.journal_fd, &data->iov, 1, bs->journal.offset);
            wait_count++;
            bs->ringloop->submit();
        resume_6:
@@ -379,7 +412,7 @@ resume_1:
            if (!bs->disable_journal_fsync)
            {
                GET_SQE();
-                my_uring_prep_fsync(sqe, bs->journal.fd, IORING_FSYNC_DATASYNC);
+                my_uring_prep_fsync(sqe, bs->dsk.journal_fd, IORING_FSYNC_DATASYNC);
                data->iov = { 0 };
                data->callback = simple_callback;
                wait_count++;
@@ -448,7 +481,7 @@ resume_1:
                    end - journal_pos < JOURNAL_BUFFER_SIZE ? end - journal_pos : JOURNAL_BUFFER_SIZE,
                };
                data->callback = [this](ring_data_t *data1) { handle_event(data1); };
-                my_uring_prep_readv(sqe, bs->journal.fd, &data->iov, 1, bs->journal.offset + journal_pos);
+                my_uring_prep_readv(sqe, bs->dsk.journal_fd, &data->iov, 1, bs->journal.offset + journal_pos);
                bs->ringloop->submit();
            }
            while (done.size() > 0)
@@ -463,7 +496,7 @@ resume_1:
                        GET_SQE();
                        data->iov = { init_write_buf, bs->journal.block_size };
                        data->callback = simple_callback;
-                        my_uring_prep_writev(sqe, bs->journal.fd, &data->iov, 1, bs->journal.offset + init_write_sector);
+                        my_uring_prep_writev(sqe, bs->dsk.journal_fd, &data->iov, 1, bs->journal.offset + init_write_sector);
                        wait_count++;
                        bs->ringloop->submit();
                    resume_7:
@@ -477,7 +510,7 @@ resume_1:
                            GET_SQE();
                            data->iov = { 0 };
                            data->callback = simple_callback;
-                            my_uring_prep_fsync(sqe, bs->journal.fd, IORING_FSYNC_DATASYNC);
+                            my_uring_prep_fsync(sqe, bs->dsk.journal_fd, IORING_FSYNC_DATASYNC);
                            wait_count++;
                            bs->ringloop->submit();
                        }
@@ -544,7 +577,7 @@ resume_1:
            ? bs->journal.len-bs->journal.block_size - (bs->journal.next_free-bs->journal.used_start)
            : bs->journal.used_start - bs->journal.next_free),
        bs->journal.used_start, bs->journal.next_free,
-        bs->data_alloc->get_free_count(), bs->block_count
+        bs->data_alloc->get_free_count(), bs->dsk.block_count
    );
    bs->journal.crc32_last = crc32_last;
    return 0;
@@ -669,9 +702,9 @@ int blockstore_init_journal::handle_journal_part(void *buf, uint64_t done_pos, u
                    };
                    void *bmp = NULL;
                    void *bmp_from = (uint8_t*)je + sizeof(journal_entry_small_write);
-                    if (bs->clean_entry_bitmap_size <= sizeof(void*))
+                    if (bs->dsk.clean_entry_bitmap_size <= sizeof(void*))
                    {
-                        memcpy(&bmp, bmp_from, bs->clean_entry_bitmap_size);
+                        memcpy(&bmp, bmp_from, bs->dsk.clean_entry_bitmap_size);
                    }
                    else
                    {
@@ -679,8 +712,8 @@ int blockstore_init_journal::handle_journal_part(void *buf, uint64_t done_pos, u
                        // allocations for entry bitmaps. This can only be fixed by using
                        // a patched map with dynamic entry size, but not the btree_map,
                        // because it doesn't keep iterators valid all the time.
-                        bmp = malloc_or_die(bs->clean_entry_bitmap_size);
-                        memcpy(bmp, bmp_from, bs->clean_entry_bitmap_size);
+                        bmp = malloc_or_die(bs->dsk.clean_entry_bitmap_size);
+                        memcpy(bmp, bmp_from, bs->dsk.clean_entry_bitmap_size);
                    }
                    bs->dirty_db.emplace(ov, (dirty_entry){
                        .state = (BS_ST_SMALL_WRITE | BS_ST_SYNCED),
@@ -712,7 +745,7 @@ int blockstore_init_journal::handle_journal_part(void *buf, uint64_t done_pos, u
                printf(
                    "je_big_write%s oid=%lx:%lx ver=%lu loc=%lu\n",
                    je->type == JE_BIG_WRITE_INSTANT ? "_instant" : "",
-                    je->big_write.oid.inode, je->big_write.oid.stripe, je->big_write.version, je->big_write.location >> bs->block_order
+                    je->big_write.oid.inode, je->big_write.oid.stripe, je->big_write.version, je->big_write.location >> bs->dsk.block_order
                );
 #endif
                auto dirty_it = bs->dirty_db.upper_bound((obj_ver_id){
@@ -750,9 +783,9 @@ int blockstore_init_journal::handle_journal_part(void *buf, uint64_t done_pos, u
                    };
                    void *bmp = NULL;
                    void *bmp_from = (uint8_t*)je + sizeof(journal_entry_big_write);
-                    if (bs->clean_entry_bitmap_size <= sizeof(void*))
+                    if (bs->dsk.clean_entry_bitmap_size <= sizeof(void*))
                    {
-                        memcpy(&bmp, bmp_from, bs->clean_entry_bitmap_size);
+                        memcpy(&bmp, bmp_from, bs->dsk.clean_entry_bitmap_size);
                    }
                    else
                    {
@@ -760,8 +793,8 @@ int blockstore_init_journal::handle_journal_part(void *buf, uint64_t done_pos, u
                        // allocations for entry bitmaps. This can only be fixed by using
                        // a patched map with dynamic entry size, but not the btree_map,
                        // because it doesn't keep iterators valid all the time.
-                        bmp = malloc_or_die(bs->clean_entry_bitmap_size);
-                        memcpy(bmp, bmp_from, bs->clean_entry_bitmap_size);
+                        bmp = malloc_or_die(bs->dsk.clean_entry_bitmap_size);
+                        memcpy(bmp, bmp_from, bs->dsk.clean_entry_bitmap_size);
                    }
                    auto dirty_it = bs->dirty_db.emplace(ov, (dirty_entry){
                        .state = (BS_ST_BIG_WRITE | BS_ST_SYNCED),
@@ -772,7 +805,7 @@ int blockstore_init_journal::handle_journal_part(void *buf, uint64_t done_pos, u
                        .journal_sector = proc_pos,
                        .bitmap = bmp,
                    }).first;
-                    if (bs->data_alloc->get(je->big_write.location >> bs->block_order))
+                    if (bs->data_alloc->get(je->big_write.location >> bs->dsk.block_order))
                    {
                        // This is probably a big_write that's already flushed and freed, but it may
                        // also indicate a bug. So we remember such entries and recheck them afterwards.
@@ -785,11 +818,11 @@ int blockstore_init_journal::handle_journal_part(void *buf, uint64_t done_pos, u
 #ifdef BLOCKSTORE_DEBUG
                        printf(
                            "Allocate block (journal) %lu: %lx:%lx v%lu\n",
-                            je->big_write.location >> bs->block_order,
+                            je->big_write.location >> bs->dsk.block_order,
                            ov.oid.inode, ov.oid.stripe, ov.version
                        );
 #endif
-                        bs->data_alloc->set(je->big_write.location >> bs->block_order, true);
+                        bs->data_alloc->set(je->big_write.location >> bs->dsk.block_order, true);
                    }
                    bs->journal.used_sectors[proc_pos]++;
 #ifdef BLOCKSTORE_DEBUG
@@ -913,8 +946,8 @@ void blockstore_init_journal::erase_dirty_object(blockstore_dirty_db_t::iterator
    if (exists && clean_loc == UINT64_MAX)
    {
        auto & sp = bs->inode_space_stats[oid.inode];
-        if (sp > bs->block_size)
-            sp -= bs->block_size;
+        if (sp > bs->dsk.data_block_size)
+            sp -= bs->dsk.data_block_size;
        else
            bs->inode_space_stats.erase(oid.inode);
    }
--- a/src/blockstore_init.h
+++ b/src/blockstore_init.h
@@ -3,20 +3,29 @@

 #pragma once

+struct blockstore_init_meta_buf
+{
+    uint8_t *buf = NULL;
+    uint64_t size = 0;
+    uint64_t offset = 0;
+    int state = 0;
+};
+
 class blockstore_init_meta
 {
    blockstore_impl_t *bs;
    int wait_state = 0;
    bool zero_on_init = false;
    void *metadata_buffer = NULL;
-    uint64_t metadata_read = 0;
-    int prev = 0, prev_done = 0, done_len = 0, submitted = 0;
-    uint64_t done_cnt = 0, done_pos = 0;
-    uint64_t entries_loaded = 0;
+    blockstore_init_meta_buf bufs[2] = {};
+    int submitted = 0;
    struct io_uring_sqe *sqe;
    struct ring_data_t *data;
-    void handle_entries(void *entries, unsigned count, int block_order);
-    void handle_event(ring_data_t *data);
+    uint64_t md_offset = 0;
+    uint64_t next_offset = 0;
+    uint64_t entries_loaded = 0;
+    bool handle_entries(uint8_t *buf, uint64_t count, uint64_t done_cnt);
+    void handle_event(ring_data_t *data, int buf_num);
 public:
    blockstore_init_meta(blockstore_impl_t *bs);
    int loop();
--- a/src/blockstore_journal.cpp
+++ b/src/blockstore_journal.cpp
@@ -175,7 +175,7 @@ void blockstore_impl_t::prepare_journal_sector_write(int cur_sector, blockstore_
        };
        data->callback = [this, flush_id = journal.submit_id](ring_data_t *data) { handle_journal_write(data, flush_id); };
        my_uring_prep_writev(
-            sqe, journal.fd, &data->iov, 1, journal.offset + journal.sector_info[cur_sector].offset
+            sqe, dsk.journal_fd, &data->iov, 1, journal.offset + journal.sector_info[cur_sector].offset
        );
    }
    journal.sector_info[cur_sector].dirty = false;
--- a/src/blockstore_journal.h
+++ b/src/blockstore_journal.h
@@ -10,6 +10,7 @@
 #define JOURNAL_MAGIC 0x4A33
 #define JOURNAL_VERSION 1
 #define JOURNAL_BUFFER_SIZE 4*1024*1024
+#define JOURNAL_ENTRY_HEADER_SIZE 16

 // We reserve some extra space for future stabilize requests during writes
 // FIXME: This value should be dynamic i.e. Blockstore ideally shouldn't allow
@@ -164,7 +165,6 @@ inline bool operator < (const pending_journaling_t & a, const pending_journaling
 struct journal_t
 {
    int fd;
-    uint64_t device_size;
    bool inmemory = false;
    bool flush_journal = false;
    void *buffer = NULL;
--- a/src/blockstore_open.cpp
+++ b/src/blockstore_open.cpp
@@ -4,23 +4,10 @@
 #include <sys/file.h>
 #include "blockstore_impl.h"

-static uint32_t is_power_of_two(uint64_t value)
-{
-    uint32_t l = 0;
-    while (value > 1)
-    {
-        if (value & 1)
-        {
-            return 64;
-        }
-        value = value >> 1;
-        l++;
-    }
-    return l;
-}
-
 void blockstore_impl_t::parse_config(blockstore_config_t & config)
 {
+    // Common disk options
+    dsk.parse_config(config);
    // Parse
    if (config["readonly"] == "true" || config["readonly"] == "1" || config["readonly"] == "yes")
    {
@@ -38,10 +25,6 @@ void blockstore_impl_t::parse_config(blockstore_config_t & config)
    {
        disable_journal_fsync = true;
    }
-    if (config["disable_device_lock"] == "true" || config["disable_device_lock"] == "1" || config["disable_device_lock"] == "yes")
-    {
-        disable_flock = true;
-    }
    if (config["flush_journal"] == "true" || config["flush_journal"] == "1" || config["flush_journal"] == "yes")
    {
        // Only flush journal and exit
@@ -56,24 +39,11 @@ void blockstore_impl_t::parse_config(blockstore_config_t & config)
        immediate_commit = IMMEDIATE_SMALL;
    }
    metadata_buf_size = strtoull(config["meta_buf_size"].c_str(), NULL, 10);
-    cfg_journal_size = strtoull(config["journal_size"].c_str(), NULL, 10);
-    data_device = config["data_device"];
-    data_offset = strtoull(config["data_offset"].c_str(), NULL, 10);
-    cfg_data_size = strtoull(config["data_size"].c_str(), NULL, 10);
-    meta_device = config["meta_device"];
-    meta_offset = strtoull(config["meta_offset"].c_str(), NULL, 10);
-    block_size = strtoull(config["block_size"].c_str(), NULL, 10);
    inmemory_meta = config["inmemory_metadata"] != "false";
-    journal_device = config["journal_device"];
-    journal.offset = strtoull(config["journal_offset"].c_str(), NULL, 10);
    journal.sector_count = strtoull(config["journal_sector_buffer_count"].c_str(), NULL, 10);
    journal.no_same_sector_overwrites = config["journal_no_same_sector_overwrites"] == "true" ||
        config["journal_no_same_sector_overwrites"] == "1" || config["journal_no_same_sector_overwrites"] == "yes";
    journal.inmemory = config["inmemory_journal"] != "false";
-    disk_alignment = strtoull(config["disk_alignment"].c_str(), NULL, 10);
-    journal_block_size = strtoull(config["journal_block_size"].c_str(), NULL, 10);
-    meta_block_size = strtoull(config["meta_block_size"].c_str(), NULL, 10);
-    bitmap_granularity = strtoull(config["bitmap_granularity"].c_str(), NULL, 10);
    max_flusher_count = strtoull(config["max_flusher_count"].c_str(), NULL, 10);
    if (!max_flusher_count)
        max_flusher_count = strtoull(config["flusher_count"].c_str(), NULL, 10);
@@ -85,14 +55,6 @@ void blockstore_impl_t::parse_config(blockstore_config_t & config)
    throttle_target_parallelism = strtoull(config["throttle_target_parallelism"].c_str(), NULL, 10);
    throttle_threshold_us = strtoull(config["throttle_threshold_us"].c_str(), NULL, 10);
    // Validate
-    if (!block_size)
-    {
-        block_size = (1 << DEFAULT_ORDER);
-    }
-    if ((block_order = is_power_of_two(block_size)) >= 64 || block_size < MIN_BLOCK_SIZE || block_size >= MAX_BLOCK_SIZE)
-    {
-        throw std::runtime_error("Bad block size");
-    }
    if (!max_flusher_count)
    {
        max_flusher_count = 256;
@@ -105,62 +67,6 @@ void blockstore_impl_t::parse_config(blockstore_config_t & config)
    {
        max_write_iodepth = 128;
    }
-    if (!disk_alignment)
-    {
-        disk_alignment = 4096;
-    }
-    else if (disk_alignment % MEM_ALIGNMENT)
-    {
-        throw std::runtime_error("disk_alignment must be a multiple of "+std::to_string(MEM_ALIGNMENT));
-    }
-    if (!journal_block_size)
-    {
-        journal_block_size = 4096;
-    }
-    else if (journal_block_size % MEM_ALIGNMENT)
-    {
-        throw std::runtime_error("journal_block_size must be a multiple of "+std::to_string(MEM_ALIGNMENT));
-    }
-    if (!meta_block_size)
-    {
-        meta_block_size = 4096;
-    }
-    else if (meta_block_size % MEM_ALIGNMENT)
-    {
-        throw std::runtime_error("meta_block_size must be a multiple of "+std::to_string(MEM_ALIGNMENT));
-    }
-    if (data_offset % disk_alignment)
-    {
-        throw std::runtime_error("data_offset must be a multiple of disk_alignment = "+std::to_string(disk_alignment));
-    }
-    if (!bitmap_granularity)
-    {
-        bitmap_granularity = DEFAULT_BITMAP_GRANULARITY;
-    }
-    else if (bitmap_granularity % disk_alignment)
-    {
-        throw std::runtime_error("Sparse write tracking granularity must be a multiple of disk_alignment = "+std::to_string(disk_alignment));
-    }
-    if (block_size % bitmap_granularity)
-    {
-        throw std::runtime_error("Block size must be a multiple of sparse write tracking granularity");
-    }
-    if (journal_device == meta_device || meta_device == "" && journal_device == data_device)
-    {
-        journal_device = "";
-    }
-    if (meta_device == data_device)
-    {
-        meta_device = "";
-    }
-    if (meta_offset % meta_block_size)
-    {
-        throw std::runtime_error("meta_offset must be a multiple of meta_block_size = "+std::to_string(meta_block_size));
-    }
-    if (journal.offset % journal_block_size)
-    {
-        throw std::runtime_error("journal_offset must be a multiple of journal_block_size = "+std::to_string(journal_block_size));
-    }
    if (journal.sector_count < 2)
    {
        journal.sector_count = 32;
@@ -169,11 +75,11 @@ void blockstore_impl_t::parse_config(blockstore_config_t & config)
    {
        metadata_buf_size = 4*1024*1024;
    }
-    if (meta_device == "")
+    if (dsk.meta_device == dsk.data_device)
    {
        disable_meta_fsync = disable_data_fsync;
    }
-    if (journal_device == "")
+    if (dsk.journal_device == dsk.meta_device)
    {
        disable_journal_fsync = disable_meta_fsync;
    }
@@ -202,238 +108,46 @@ void blockstore_impl_t::parse_config(blockstore_config_t & config)
        throttle_threshold_us = 50;
    }
    // init some fields
-    clean_entry_bitmap_size = block_size / bitmap_granularity / 8;
-    clean_entry_size = sizeof(clean_disk_entry) + 2*clean_entry_bitmap_size;
-    journal.block_size = journal_block_size;
-    journal.next_free = journal_block_size;
-    journal.used_start = journal_block_size;
+    journal.block_size = dsk.journal_block_size;
+    journal.next_free = dsk.journal_block_size;
+    journal.used_start = dsk.journal_block_size;
    // no free space because sector is initially unmapped
-    journal.in_sector_pos = journal_block_size;
+    journal.in_sector_pos = dsk.journal_block_size;
 }

 void blockstore_impl_t::calc_lengths()
 {
-    // data
-    data_len = data_size - data_offset;
-    if (data_fd == meta_fd && data_offset < meta_offset)
-    {
-        data_len = meta_offset - data_offset;
-    }
-    if (data_fd == journal.fd && data_offset < journal.offset)
-    {
-        data_len = data_len < journal.offset-data_offset
-            ? data_len : journal.offset-data_offset;
-    }
-    if (cfg_data_size != 0)
-    {
-        if (data_len < cfg_data_size)
-        {
-            throw std::runtime_error("Data area ("+std::to_string(data_len)+
-                " bytes) is less than configured size ("+std::to_string(cfg_data_size)+" bytes)");
-        }
-        data_len = cfg_data_size;
-    }
-    // meta
-    meta_area = (meta_fd == data_fd ? data_size : meta_size) - meta_offset;
-    if (meta_fd == data_fd && meta_offset <= data_offset)
-    {
-        meta_area = data_offset - meta_offset;
-    }
-    if (meta_fd == journal.fd && meta_offset <= journal.offset)
-    {
-        meta_area = meta_area < journal.offset-meta_offset
-            ? meta_area : journal.offset-meta_offset;
-    }
-    // journal
-    journal.len = (journal.fd == data_fd ? data_size : (journal.fd == meta_fd ? meta_size : journal.device_size)) - journal.offset;
-    if (journal.fd == data_fd && journal.offset <= data_offset)
-    {
-        journal.len = data_offset - journal.offset;
-    }
-    if (journal.fd == meta_fd && journal.offset <= meta_offset)
-    {
-        journal.len = journal.len < meta_offset-journal.offset
-            ? journal.len : meta_offset-journal.offset;
-    }
-    // required metadata size
-    block_count = data_len / block_size;
-    meta_len = (1 + (block_count - 1 + meta_block_size / clean_entry_size) / (meta_block_size / clean_entry_size)) * meta_block_size;
-    if (meta_area < meta_len)
-    {
-        throw std::runtime_error("Metadata area is too small, need at least "+std::to_string(meta_len)+" bytes");
-    }
+    dsk.calc_lengths();
+    journal.len = dsk.journal_len;
+    journal.block_size = dsk.journal_block_size;
+    journal.offset = dsk.journal_offset;
    if (inmemory_meta)
    {
-        metadata_buffer = memalign(MEM_ALIGNMENT, meta_len);
+        metadata_buffer = memalign(MEM_ALIGNMENT, dsk.meta_len);
        if (!metadata_buffer)
            throw std::runtime_error("Failed to allocate memory for the metadata");
    }
-    else if (clean_entry_bitmap_size)
+    else if (dsk.clean_entry_bitmap_size)
    {
-        clean_bitmap = (uint8_t*)malloc(block_count * 2*clean_entry_bitmap_size);
+        clean_bitmap = (uint8_t*)malloc(dsk.block_count * 2*dsk.clean_entry_bitmap_size);
        if (!clean_bitmap)
            throw std::runtime_error("Failed to allocate memory for the metadata sparse write bitmap");
    }
-    // requested journal size
-    if (cfg_journal_size > journal.len)
-    {
-        throw std::runtime_error("Requested journal_size is too large");
-    }
-    else if (cfg_journal_size > 0)
-    {
-        journal.len = cfg_journal_size;
-    }
-    if (journal.len < MIN_JOURNAL_SIZE)
-    {
-        throw std::runtime_error("Journal is too small, need at least "+std::to_string(MIN_JOURNAL_SIZE)+" bytes");
-    }
    if (journal.inmemory)
    {
        journal.buffer = memalign(MEM_ALIGNMENT, journal.len);
        if (!journal.buffer)
            throw std::runtime_error("Failed to allocate memory for journal");
    }
-}
-
-static void check_size(int fd, uint64_t *size, uint64_t *sectsize, std::string name)
-{
-    int sect;
-    struct stat st;
-    if (fstat(fd, &st) < 0)
-    {
-        throw std::runtime_error("Failed to stat "+name);
-    }
-    if (S_ISREG(st.st_mode))
-    {
-        *size = st.st_size;
-        if (sectsize)
-        {
-            *sectsize = st.st_blksize;
-        }
-    }
-    else if (S_ISBLK(st.st_mode))
-    {
-        if (ioctl(fd, BLKGETSIZE64, size) < 0 ||
-            ioctl(fd, BLKSSZGET, &sect) < 0)
-        {
-            throw std::runtime_error("failed to get "+name+" size or block size: "+strerror(errno));
-        }
-        if (sectsize)
-        {
-            *sectsize = sect;
-        }
-    }
    else
    {
-        throw std::runtime_error(name+" is neither a file nor a block device");
-    }
-}
-
-void blockstore_impl_t::open_data()
-{
-    data_fd = open(data_device.c_str(), O_DIRECT|O_RDWR);
-    if (data_fd == -1)
-    {
-        throw std::runtime_error("Failed to open data device");
-    }
-    check_size(data_fd, &data_size, &data_device_sect, "data device");
-    if (disk_alignment % data_device_sect)
-    {
-        throw std::runtime_error(
-            "disk_alignment ("+std::to_string(disk_alignment)+
-            ") is not a multiple of data device sector size ("+std::to_string(data_device_sect)+")"
-        );
-    }
-    if (data_offset >= data_size)
-    {
-        throw std::runtime_error("data_offset exceeds device size = "+std::to_string(data_size));
-    }
-    if (!disable_flock && flock(data_fd, LOCK_EX|LOCK_NB) != 0)
-    {
-        throw std::runtime_error(std::string("Failed to lock data device: ") + strerror(errno));
-    }
-}
-
-void blockstore_impl_t::open_meta()
-{
-    if (meta_device != "")
-    {
-        meta_offset = 0;
-        meta_fd = open(meta_device.c_str(), O_DIRECT|O_RDWR);
-        if (meta_fd == -1)
-        {
-            throw std::runtime_error("Failed to open metadata device");
-        }
-        check_size(meta_fd, &meta_size, &meta_device_sect, "metadata device");
-        if (meta_offset >= meta_size)
-        {
-            throw std::runtime_error("meta_offset exceeds device size = "+std::to_string(meta_size));
-        }
-        if (!disable_flock && flock(meta_fd, LOCK_EX|LOCK_NB) != 0)
-        {
-            throw std::runtime_error(std::string("Failed to lock metadata device: ") + strerror(errno));
-        }
-    }
-    else
-    {
-        meta_fd = data_fd;
-        meta_device_sect = data_device_sect;
-        meta_size = 0;
-        if (meta_offset >= data_size)
-        {
-            throw std::runtime_error("meta_offset exceeds device size = "+std::to_string(data_size));
-        }
-    }
-    if (meta_block_size % meta_device_sect)
-    {
-        throw std::runtime_error(
-            "meta_block_size ("+std::to_string(meta_block_size)+
-            ") is not a multiple of data device sector size ("+std::to_string(meta_device_sect)+")"
-        );
-    }
-}
-
-void blockstore_impl_t::open_journal()
-{
-    if (journal_device != "")
-    {
-        journal.fd = open(journal_device.c_str(), O_DIRECT|O_RDWR);
-        if (journal.fd == -1)
-        {
-            throw std::runtime_error("Failed to open journal device");
-        }
-        check_size(journal.fd, &journal.device_size, &journal_device_sect, "journal device");
-        if (!disable_flock && flock(journal.fd, LOCK_EX|LOCK_NB) != 0)
-        {
-            throw std::runtime_error(std::string("Failed to lock journal device: ") + strerror(errno));
-        }
-    }
-    else
-    {
-        journal.fd = meta_fd;
-        journal_device_sect = meta_device_sect;
-        journal.device_size = 0;
-        if (journal.offset >= data_size)
-        {
-            throw std::runtime_error("journal_offset exceeds device size");
-        }
+        journal.sector_buf = (uint8_t*)memalign(MEM_ALIGNMENT, journal.sector_count * dsk.journal_block_size);
+        if (!journal.sector_buf)
+            throw std::bad_alloc();
    }
    journal.sector_info = (journal_sector_info_t*)calloc(journal.sector_count, sizeof(journal_sector_info_t));
    if (!journal.sector_info)
    {
        throw std::bad_alloc();
    }
-    if (!journal.inmemory)
-    {
-        journal.sector_buf = (uint8_t*)memalign(MEM_ALIGNMENT, journal.sector_count * journal_block_size);
-        if (!journal.sector_buf)
-            throw std::bad_alloc();
-    }
-    if (journal_block_size % journal_device_sect)
-    {
-        throw std::runtime_error(
-            "journal_block_size ("+std::to_string(journal_block_size)+
-            ") is not a multiple of journal device sector size ("+std::to_string(journal_device_sect)+")"
-        );
-    }
 }
--- a/src/blockstore_read.cpp
+++ b/src/blockstore_read.cpp
@@ -32,9 +32,9 @@ int blockstore_impl_t::fulfill_read_push(blockstore_op_t *op, void *buf, uint64_
    PRIV(op)->pending_ops++;
    my_uring_prep_readv(
        sqe,
-        IS_JOURNAL(item_state) ? journal.fd : data_fd,
+        IS_JOURNAL(item_state) ? dsk.journal_fd : dsk.data_fd,
        &data->iov, 1,
-        (IS_JOURNAL(item_state) ? journal.offset : data_offset) + offset
+        (IS_JOURNAL(item_state) ? dsk.journal_offset : dsk.data_offset) + offset
    );
    data->callback = [this, op](ring_data_t *data) { handle_read_event(data, op); };
    return 1;
@@ -42,7 +42,7 @@ int blockstore_impl_t::fulfill_read_push(blockstore_op_t *op, void *buf, uint64_

 // FIXME I've seen a bug here so I want some tests
 int blockstore_impl_t::fulfill_read(blockstore_op_t *read_op, uint64_t &fulfilled, uint32_t item_start, uint32_t item_end,
-    uint32_t item_state, uint64_t item_version, uint64_t item_location)
+    uint32_t item_state, uint64_t item_version, uint64_t item_location, uint64_t journal_sector)
 {
    uint32_t cur_start = item_start;
    if (cur_start < read_op->offset + read_op->len && item_end > read_op->offset)
@@ -72,6 +72,7 @@ int blockstore_impl_t::fulfill_read(blockstore_op_t *read_op, uint64_t &fulfille
                fulfill_read_t el = {
                    .offset = cur_start,
                    .len = it == PRIV(read_op)->read_vec.end() || it->offset >= item_end ? item_end-cur_start : it->offset-cur_start,
+                    .journal_sector = journal_sector,
                };
                it = PRIV(read_op)->read_vec.insert(it, el);
                if (!fulfill_read_push(read_op,
@@ -97,15 +98,15 @@ endwhile:
 uint8_t* blockstore_impl_t::get_clean_entry_bitmap(uint64_t block_loc, int offset)
 {
    uint8_t *clean_entry_bitmap;
-    uint64_t meta_loc = block_loc >> block_order;
+    uint64_t meta_loc = block_loc >> dsk.block_order;
    if (inmemory_meta)
    {
-        uint64_t sector = (meta_loc / (meta_block_size / clean_entry_size)) * meta_block_size;
-        uint64_t pos = (meta_loc % (meta_block_size / clean_entry_size));
-        clean_entry_bitmap = ((uint8_t*)metadata_buffer + sector + pos*clean_entry_size + sizeof(clean_disk_entry) + offset);
+        uint64_t sector = (meta_loc / (dsk.meta_block_size / dsk.clean_entry_size)) * dsk.meta_block_size;
+        uint64_t pos = (meta_loc % (dsk.meta_block_size / dsk.clean_entry_size));
+        clean_entry_bitmap = ((uint8_t*)metadata_buffer + sector + pos*dsk.clean_entry_size + sizeof(clean_disk_entry) + offset);
    }
    else
-        clean_entry_bitmap = (uint8_t*)(clean_bitmap + meta_loc*2*clean_entry_bitmap_size + offset);
+        clean_entry_bitmap = (uint8_t*)(clean_bitmap + meta_loc*2*dsk.clean_entry_bitmap_size + offset);
    return clean_entry_bitmap;
 }

@@ -152,12 +153,14 @@ int blockstore_impl_t::dequeue_read(blockstore_op_t *read_op)
                    result_version = dirty_it->first.version;
                    if (read_op->bitmap)
                    {
-                        void *bmp_ptr = (clean_entry_bitmap_size > sizeof(void*) ? dirty_it->second.bitmap : &dirty_it->second.bitmap);
-                        memcpy(read_op->bitmap, bmp_ptr, clean_entry_bitmap_size);
+                        void *bmp_ptr = (dsk.clean_entry_bitmap_size > sizeof(void*) ? dirty_it->second.bitmap : &dirty_it->second.bitmap);
+                        memcpy(read_op->bitmap, bmp_ptr, dsk.clean_entry_bitmap_size);
                    }
                }
+                // If inmemory_journal is false, journal trim will have to wait until the read is completed
                if (!fulfill_read(read_op, fulfilled, dirty.offset, dirty.offset + dirty.len,
-                    dirty.state, dirty_it->first.version, dirty.location + (IS_JOURNAL(dirty.state) ? 0 : dirty.offset)))
+                    dirty.state, dirty_it->first.version, dirty.location + (IS_JOURNAL(dirty.state) ? 0 : dirty.offset),
+                    (IS_JOURNAL(dirty.state) ? dirty.journal_sector+1 : 0)))
                {
                    // need to wait. undo added requests, don't dequeue op
                    PRIV(read_op)->read_vec.clear();
@@ -178,15 +181,16 @@ int blockstore_impl_t::dequeue_read(blockstore_op_t *read_op)
            result_version = clean_it->second.version;
            if (read_op->bitmap)
            {
-                void *bmp_ptr = get_clean_entry_bitmap(clean_it->second.location, clean_entry_bitmap_size);
-                memcpy(read_op->bitmap, bmp_ptr, clean_entry_bitmap_size);
+                void *bmp_ptr = get_clean_entry_bitmap(clean_it->second.location, dsk.clean_entry_bitmap_size);
+                memcpy(read_op->bitmap, bmp_ptr, dsk.clean_entry_bitmap_size);
            }
        }
        if (fulfilled < read_op->len)
        {
-            if (!clean_entry_bitmap_size)
+            if (!dsk.clean_entry_bitmap_size)
            {
-                if (!fulfill_read(read_op, fulfilled, 0, block_size, (BS_ST_BIG_WRITE | BS_ST_STABLE), 0, clean_it->second.location))
+                if (!fulfill_read(read_op, fulfilled, 0, dsk.data_block_size,
+                    (BS_ST_BIG_WRITE | BS_ST_STABLE), 0, clean_it->second.location, 0))
                {
                    // need to wait. undo added requests, don't dequeue op
                    PRIV(read_op)->read_vec.clear();
@@ -196,7 +200,7 @@ int blockstore_impl_t::dequeue_read(blockstore_op_t *read_op)
            else
            {
                uint8_t *clean_entry_bitmap = get_clean_entry_bitmap(clean_it->second.location, 0);
-                uint64_t bmp_start = 0, bmp_end = 0, bmp_size = block_size/bitmap_granularity;
+                uint64_t bmp_start = 0, bmp_end = 0, bmp_size = dsk.data_block_size/dsk.bitmap_granularity;
                while (bmp_start < bmp_size)
                {
                    while (!(clean_entry_bitmap[bmp_end >> 3] & (1 << (bmp_end & 0x7))) && bmp_end < bmp_size)
@@ -206,8 +210,8 @@ int blockstore_impl_t::dequeue_read(blockstore_op_t *read_op)
                    if (bmp_end > bmp_start)
                    {
                        // fill with zeroes
-                        assert(fulfill_read(read_op, fulfilled, bmp_start * bitmap_granularity,
-                            bmp_end * bitmap_granularity, (BS_ST_DELETE | BS_ST_STABLE), 0, 0));
+                        assert(fulfill_read(read_op, fulfilled, bmp_start * dsk.bitmap_granularity,
+                            bmp_end * dsk.bitmap_granularity, (BS_ST_DELETE | BS_ST_STABLE), 0, 0, 0));
                    }
                    bmp_start = bmp_end;
                    while (clean_entry_bitmap[bmp_end >> 3] & (1 << (bmp_end & 0x7)) && bmp_end < bmp_size)
@@ -216,9 +220,9 @@ int blockstore_impl_t::dequeue_read(blockstore_op_t *read_op)
                    }
                    if (bmp_end > bmp_start)
                    {
-                        if (!fulfill_read(read_op, fulfilled, bmp_start * bitmap_granularity,
-                            bmp_end * bitmap_granularity, (BS_ST_BIG_WRITE | BS_ST_STABLE), 0,
-                            clean_it->second.location + bmp_start * bitmap_granularity))
+                        if (!fulfill_read(read_op, fulfilled, bmp_start * dsk.bitmap_granularity,
+                            bmp_end * dsk.bitmap_granularity, (BS_ST_BIG_WRITE | BS_ST_STABLE), 0,
+                            clean_it->second.location + bmp_start * dsk.bitmap_granularity, 0))
                        {
                            // need to wait. undo added requests, don't dequeue op
                            PRIV(read_op)->read_vec.clear();
@@ -233,7 +237,7 @@ int blockstore_impl_t::dequeue_read(blockstore_op_t *read_op)
    else if (fulfilled < read_op->len)
    {
        // fill remaining parts with zeroes
-        assert(fulfill_read(read_op, fulfilled, 0, block_size, (BS_ST_DELETE | BS_ST_STABLE), 0, 0));
+        assert(fulfill_read(read_op, fulfilled, 0, dsk.data_block_size, (BS_ST_DELETE | BS_ST_STABLE), 0, 0, 0));
    }
    assert(fulfilled == read_op->len);
    read_op->version = result_version;
@@ -249,6 +253,15 @@ int blockstore_impl_t::dequeue_read(blockstore_op_t *read_op)
        FINISH_OP(read_op);
        return 2;
    }
+    if (!journal.inmemory)
+    {
+        // Journal trim has to wait until the read is completed - record journal sector usage
+        for (auto & rv: PRIV(read_op)->read_vec)
+        {
+            if (rv.journal_sector)
+                journal.used_sectors[rv.journal_sector-1]++;
+        }
+    }
    read_op->retval = 0;
    return 2;
 }
@@ -264,6 +277,19 @@ void blockstore_impl_t::handle_read_event(ring_data_t *data, blockstore_op_t *op
    }
    if (PRIV(op)->pending_ops == 0)
    {
+        if (!journal.inmemory)
+        {
+            // Release journal sector usage
+            for (auto & rv: PRIV(op)->read_vec)
+            {
+                if (rv.journal_sector)
+                {
+                    auto used = --journal.used_sectors[rv.journal_sector-1];
+                    if (used == 0)
+                        journal.used_sectors.erase(rv.journal_sector-1);
+                }
+            }
+        }
        if (op->retval == 0)
            op->retval = op->len;
        FINISH_OP(op);
@@ -288,8 +314,8 @@ int blockstore_impl_t::read_bitmap(object_id oid, uint64_t target_version, void
                    *result_version = dirty_it->first.version;
                if (bitmap)
                {
-                    void *bmp_ptr = (clean_entry_bitmap_size > sizeof(void*) ? dirty_it->second.bitmap : &dirty_it->second.bitmap);
-                    memcpy(bitmap, bmp_ptr, clean_entry_bitmap_size);
+                    void *bmp_ptr = (dsk.clean_entry_bitmap_size > sizeof(void*) ? dirty_it->second.bitmap : &dirty_it->second.bitmap);
+                    memcpy(bitmap, bmp_ptr, dsk.clean_entry_bitmap_size);
                }
                return 0;
            }
@@ -306,14 +332,14 @@ int blockstore_impl_t::read_bitmap(object_id oid, uint64_t target_version, void
            *result_version = clean_it->second.version;
        if (bitmap)
        {
-            void *bmp_ptr = get_clean_entry_bitmap(clean_it->second.location, clean_entry_bitmap_size);
-            memcpy(bitmap, bmp_ptr, clean_entry_bitmap_size);
+            void *bmp_ptr = get_clean_entry_bitmap(clean_it->second.location, dsk.clean_entry_bitmap_size);
+            memcpy(bitmap, bmp_ptr, dsk.clean_entry_bitmap_size);
        }
        return 0;
    }
    if (result_version)
        *result_version = 0;
    if (bitmap)
-        memset(bitmap, 0, clean_entry_bitmap_size);
+        memset(bitmap, 0, dsk.clean_entry_bitmap_size);
    return -ENOENT;
 }
--- a/src/blockstore_rollback.cpp
+++ b/src/blockstore_rollback.cpp
@@ -112,7 +112,7 @@ resume_2:
    if (!disable_journal_fsync)
    {
        BS_SUBMIT_GET_SQE(sqe, data);
-        my_uring_prep_fsync(sqe, journal.fd, IORING_FSYNC_DATASYNC);
+        my_uring_prep_fsync(sqe, dsk.journal_fd, IORING_FSYNC_DATASYNC);
        data->iov = { 0 };
        data->callback = [this, op](ring_data_t *data) { handle_write_event(data, op); };
        PRIV(op)->min_flushed_journal_sector = PRIV(op)->max_flushed_journal_sector = 0;
@@ -217,12 +217,12 @@ void blockstore_impl_t::erase_dirty(blockstore_dirty_db_t::iterator dirty_start,
            dirty_it->second.location != UINT64_MAX)
        {
 #ifdef BLOCKSTORE_DEBUG
-            printf("Free block %lu from %lx:%lx v%lu\n", dirty_it->second.location >> block_order,
+            printf("Free block %lu from %lx:%lx v%lu\n", dirty_it->second.location >> dsk.block_order,
                dirty_it->first.oid.inode, dirty_it->first.oid.stripe, dirty_it->first.version);
 #endif
-            data_alloc->set(dirty_it->second.location >> block_order, false);
+            data_alloc->set(dirty_it->second.location >> dsk.block_order, false);
        }
-        int used = --journal.used_sectors[dirty_it->second.journal_sector];
+        auto used = --journal.used_sectors[dirty_it->second.journal_sector];
 #ifdef BLOCKSTORE_DEBUG
        printf(
            "remove usage of journal offset %08lx by %lx:%lx v%lu (%d refs)\n", dirty_it->second.journal_sector,
@@ -233,7 +233,7 @@ void blockstore_impl_t::erase_dirty(blockstore_dirty_db_t::iterator dirty_start,
        {
            journal.used_sectors.erase(dirty_it->second.journal_sector);
        }
-        if (clean_entry_bitmap_size > sizeof(void*))
+        if (dsk.clean_entry_bitmap_size > sizeof(void*))
        {
            free(dirty_it->second.bitmap);
            dirty_it->second.bitmap = NULL;
--- a/src/blockstore_stable.cpp
+++ b/src/blockstore_stable.cpp
@@ -137,7 +137,7 @@ resume_2:
    if (!disable_journal_fsync)
    {
        BS_SUBMIT_GET_SQE(sqe, data);
-        my_uring_prep_fsync(sqe, journal.fd, IORING_FSYNC_DATASYNC);
+        my_uring_prep_fsync(sqe, dsk.journal_fd, IORING_FSYNC_DATASYNC);
        data->iov = { 0 };
        data->callback = [this, op](ring_data_t *data) { handle_write_event(data, op); };
        PRIV(op)->min_flushed_journal_sector = PRIV(op)->max_flushed_journal_sector = 0;
@@ -195,14 +195,14 @@ void blockstore_impl_t::mark_stable(const obj_ver_id & v, bool forget_dirty)
                    }
                    if (!exists)
                    {
-                        inode_space_stats[dirty_it->first.oid.inode] += block_size;
+                        inode_space_stats[dirty_it->first.oid.inode] += dsk.data_block_size;
                    }
                }
                else if (IS_DELETE(dirty_it->second.state))
                {
                    auto & sp = inode_space_stats[dirty_it->first.oid.inode];
-                    if (sp > block_size)
-                        sp -= block_size;
+                    if (sp > dsk.data_block_size)
+                        sp -= dsk.data_block_size;
                    else
                        inode_space_stats.erase(dirty_it->first.oid.inode);
                }
--- a/src/blockstore_sync.cpp
+++ b/src/blockstore_sync.cpp
@@ -60,7 +60,7 @@ int blockstore_impl_t::continue_sync(blockstore_op_t *op, bool queue_has_in_prog
        if (!disable_data_fsync)
        {
            BS_SUBMIT_GET_SQE(sqe, data);
-            my_uring_prep_fsync(sqe, data_fd, IORING_FSYNC_DATASYNC);
+            my_uring_prep_fsync(sqe, dsk.data_fd, IORING_FSYNC_DATASYNC);
            data->iov = { 0 };
            data->callback = [this, op](ring_data_t *data) { handle_write_event(data, op); };
            PRIV(op)->min_flushed_journal_sector = PRIV(op)->max_flushed_journal_sector = 0;
@@ -79,7 +79,7 @@ int blockstore_impl_t::continue_sync(blockstore_op_t *op, bool queue_has_in_prog
        // Check space in the journal and journal memory buffers
        blockstore_journal_check_t space_check(this);
        if (!space_check.check_available(op, PRIV(op)->sync_big_writes.size(),
-            sizeof(journal_entry_big_write) + clean_entry_bitmap_size, JOURNAL_STABILIZE_RESERVATION))
+            sizeof(journal_entry_big_write) + dsk.clean_entry_bitmap_size, JOURNAL_STABILIZE_RESERVATION))
        {
            return 0;
        }
@@ -90,7 +90,7 @@ int blockstore_impl_t::continue_sync(blockstore_op_t *op, bool queue_has_in_prog
        int s = 0;
        while (it != PRIV(op)->sync_big_writes.end())
        {
-            if (!journal.entry_fits(sizeof(journal_entry_big_write) + clean_entry_bitmap_size) &&
+            if (!journal.entry_fits(sizeof(journal_entry_big_write) + dsk.clean_entry_bitmap_size) &&
                journal.sector_info[journal.cur_sector].dirty)
            {
                prepare_journal_sector_write(journal.cur_sector, op);
@@ -99,7 +99,7 @@ int blockstore_impl_t::continue_sync(blockstore_op_t *op, bool queue_has_in_prog
            auto & dirty_entry = dirty_db.at(*it);
            journal_entry_big_write *je = (journal_entry_big_write*)prefill_single_journal_entry(
                journal, (dirty_entry.state & BS_ST_INSTANT) ? JE_BIG_WRITE_INSTANT : JE_BIG_WRITE,
-                sizeof(journal_entry_big_write) + clean_entry_bitmap_size
+                sizeof(journal_entry_big_write) + dsk.clean_entry_bitmap_size
            );
            dirty_entry.journal_sector = journal.sector_info[journal.cur_sector].offset;
            journal.used_sectors[journal.sector_info[journal.cur_sector].offset]++;
@@ -115,8 +115,8 @@ int blockstore_impl_t::continue_sync(blockstore_op_t *op, bool queue_has_in_prog
            je->offset = dirty_entry.offset;
            je->len = dirty_entry.len;
            je->location = dirty_entry.location;
-            memcpy((void*)(je+1), (clean_entry_bitmap_size > sizeof(void*)
-                ? dirty_entry.bitmap : &dirty_entry.bitmap), clean_entry_bitmap_size);
+            memcpy((void*)(je+1), (dsk.clean_entry_bitmap_size > sizeof(void*)
+                ? dirty_entry.bitmap : &dirty_entry.bitmap), dsk.clean_entry_bitmap_size);
            je->crc32 = je_crc32((journal_entry*)je);
            journal.crc32_last = je->crc32;
            it++;
@@ -132,7 +132,7 @@ int blockstore_impl_t::continue_sync(blockstore_op_t *op, bool queue_has_in_prog
        if (!disable_journal_fsync)
        {
            BS_SUBMIT_GET_SQE(sqe, data);
-            my_uring_prep_fsync(sqe, journal.fd, IORING_FSYNC_DATASYNC);
+            my_uring_prep_fsync(sqe, dsk.journal_fd, IORING_FSYNC_DATASYNC);
            data->iov = { 0 };
            data->callback = [this, op](ring_data_t *data) { handle_write_event(data, op); };
            PRIV(op)->min_flushed_journal_sector = PRIV(op)->max_flushed_journal_sector = 0;
--- a/src/blockstore_write.cpp
+++ b/src/blockstore_write.cpp
@@ -10,9 +10,9 @@ bool blockstore_impl_t::enqueue_write(blockstore_op_t *op)
    bool wait_big = false, wait_del = false;
    void *bmp = NULL;
    uint64_t version = 1;
-    if (!is_del && clean_entry_bitmap_size > sizeof(void*))
+    if (!is_del && dsk.clean_entry_bitmap_size > sizeof(void*))
    {
-        bmp = calloc_or_die(1, clean_entry_bitmap_size);
+        bmp = calloc_or_die(1, dsk.clean_entry_bitmap_size);
    }
    if (dirty_db.size() > 0)
    {
@@ -32,8 +32,8 @@ bool blockstore_impl_t::enqueue_write(blockstore_op_t *op)
                : ((dirty_it->second.state & BS_ST_WORKFLOW_MASK) == BS_ST_WAIT_BIG);
            if (!is_del && !deleted)
            {
-                if (clean_entry_bitmap_size > sizeof(void*))
-                    memcpy(bmp, dirty_it->second.bitmap, clean_entry_bitmap_size);
+                if (dsk.clean_entry_bitmap_size > sizeof(void*))
+                    memcpy(bmp, dirty_it->second.bitmap, dsk.clean_entry_bitmap_size);
                else
                    bmp = dirty_it->second.bitmap;
            }
@@ -48,8 +48,8 @@ bool blockstore_impl_t::enqueue_write(blockstore_op_t *op)
            version = clean_it->second.version + 1;
            if (!is_del)
            {
-                void *bmp_ptr = get_clean_entry_bitmap(clean_it->second.location, clean_entry_bitmap_size);
-                memcpy((clean_entry_bitmap_size > sizeof(void*) ? bmp : &bmp), bmp_ptr, clean_entry_bitmap_size);
+                void *bmp_ptr = get_clean_entry_bitmap(clean_it->second.location, dsk.clean_entry_bitmap_size);
+                memcpy((dsk.clean_entry_bitmap_size > sizeof(void*) ? bmp : &bmp), bmp_ptr, dsk.clean_entry_bitmap_size);
            }
        }
        else
@@ -90,14 +90,14 @@ bool blockstore_impl_t::enqueue_write(blockstore_op_t *op)
        {
            // Invalid version requested
            op->retval = -EEXIST;
-            if (!is_del && clean_entry_bitmap_size > sizeof(void*))
+            if (!is_del && dsk.clean_entry_bitmap_size > sizeof(void*))
            {
                free(bmp);
            }
            return false;
        }
    }
-    if (wait_big && !is_del && !deleted && op->len < block_size &&
+    if (wait_big && !is_del && !deleted && op->len < dsk.data_block_size &&
        immediate_commit != IMMEDIATE_ALL)
    {
        // Issue an additional sync so that the previous big write can reach the journal
@@ -122,7 +122,7 @@ bool blockstore_impl_t::enqueue_write(blockstore_op_t *op)
        state = BS_ST_DELETE | BS_ST_IN_FLIGHT;
    else
    {
-        state = (op->len == block_size || deleted ? BS_ST_BIG_WRITE : BS_ST_SMALL_WRITE);
+        state = (op->len == dsk.data_block_size || deleted ? BS_ST_BIG_WRITE : BS_ST_SMALL_WRITE);
        if (state == BS_ST_SMALL_WRITE && throttle_small_writes)
            clock_gettime(CLOCK_REALTIME, &PRIV(op)->tv_begin);
        if (wait_del)
@@ -136,9 +136,9 @@ bool blockstore_impl_t::enqueue_write(blockstore_op_t *op)
        if (op->bitmap)
        {
            // Only allow to overwrite part of the object bitmap respective to the write's offset/len
-            uint8_t *bmp_ptr = (uint8_t*)(clean_entry_bitmap_size > sizeof(void*) ? bmp : &bmp);
-            uint32_t bit = op->offset/bitmap_granularity;
-            uint32_t bits_left = op->len/bitmap_granularity;
+            uint8_t *bmp_ptr = (uint8_t*)(dsk.clean_entry_bitmap_size > sizeof(void*) ? bmp : &bmp);
+            uint32_t bit = op->offset/dsk.bitmap_granularity;
+            uint32_t bits_left = op->len/dsk.bitmap_granularity;
            while (!(bit % 8) && bits_left > 8)
            {
                // Copy bytes
@@ -175,14 +175,15 @@ void blockstore_impl_t::cancel_all_writes(blockstore_op_t *op, blockstore_dirty_
 {
    while (dirty_it != dirty_db.end() && dirty_it->first.oid == op->oid)
    {
-        if (clean_entry_bitmap_size > sizeof(void*))
+        if (dsk.clean_entry_bitmap_size > sizeof(void*))
            free(dirty_it->second.bitmap);
        dirty_db.erase(dirty_it++);
    }
    bool found = false;
    for (auto other_op: submit_queue)
    {
-        if (!found && other_op == op)
+        // <op> may be present in queue multiple times due to moving operations in submit_queue
+        if (other_op == op)
            found = true;
        else if (found && other_op->oid == op->oid &&
            (other_op->opcode == BS_OP_WRITE || other_op->opcode == BS_OP_WRITE_STABLE))
@@ -251,7 +252,7 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
    {
        blockstore_journal_check_t space_check(this);
        if (!space_check.check_available(op, unsynced_big_write_count + 1,
-            sizeof(journal_entry_big_write) + clean_entry_bitmap_size, JOURNAL_STABILIZE_RESERVATION))
+            sizeof(journal_entry_big_write) + dsk.clean_entry_bitmap_size, JOURNAL_STABILIZE_RESERVATION))
        {
            return 0;
        }
@@ -269,9 +270,25 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
            cancel_all_writes(op, dirty_it, -ENOSPC);
            return 2;
        }
+        if (inmemory_meta)
+        {
+            // Check once more that metadata entry is zeroed (the reverse means a bug or corruption)
+            uint64_t sector = (loc / (dsk.meta_block_size / dsk.clean_entry_size)) * dsk.meta_block_size;
+            uint64_t pos = (loc % (dsk.meta_block_size / dsk.clean_entry_size));
+            clean_disk_entry *entry = (clean_disk_entry*)((uint8_t*)metadata_buffer + sector + pos*dsk.clean_entry_size);
+            if (entry->oid.inode || entry->oid.stripe || entry->version)
+            {
+                printf(
+                    "Fatal error (metadata corruption or bug): tried to write object %lx:%lx v%lu"
+                    " over a non-zero metadata entry %lu with %lx:%lx v%lu\n", op->oid.inode,
+                    op->oid.stripe, op->version, loc, entry->oid.inode, entry->oid.stripe, entry->version
+                );
+                exit(1);
+            }
+        }
        BS_SUBMIT_GET_SQE(sqe, data);
        write_iodepth++;
-        dirty_it->second.location = loc << block_order;
+        dirty_it->second.location = loc << dsk.block_order;
        dirty_it->second.state = (dirty_it->second.state & ~BS_ST_WORKFLOW_MASK) | BS_ST_SUBMITTED;
 #ifdef BLOCKSTORE_DEBUG
        printf(
@@ -280,9 +297,9 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
        );
 #endif
        data_alloc->set(loc, true);
-        uint64_t stripe_offset = (op->offset % bitmap_granularity);
-        uint64_t stripe_end = (op->offset + op->len) % bitmap_granularity;
-        // Zero fill up to bitmap_granularity
+        uint64_t stripe_offset = (op->offset % dsk.bitmap_granularity);
+        uint64_t stripe_end = (op->offset + op->len) % dsk.bitmap_granularity;
+        // Zero fill up to dsk.bitmap_granularity
        int vcnt = 0;
        if (stripe_offset)
        {
@@ -291,13 +308,13 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
        PRIV(op)->iov_zerofill[vcnt++] = (struct iovec){ op->buf, op->len };
        if (stripe_end)
        {
-            stripe_end = bitmap_granularity - stripe_end;
+            stripe_end = dsk.bitmap_granularity - stripe_end;
            PRIV(op)->iov_zerofill[vcnt++] = (struct iovec){ zero_object, stripe_end };
        }
        data->iov.iov_len = op->len + stripe_offset + stripe_end; // to check it in the callback
        data->callback = [this, op](ring_data_t *data) { handle_write_event(data, op); };
        my_uring_prep_writev(
-            sqe, data_fd, PRIV(op)->iov_zerofill, vcnt, data_offset + (loc << block_order) + op->offset - stripe_offset
+            sqe, dsk.data_fd, PRIV(op)->iov_zerofill, vcnt, dsk.data_offset + (loc << dsk.block_order) + op->offset - stripe_offset
        );
        PRIV(op)->pending_ops = 1;
        PRIV(op)->min_flushed_journal_sector = PRIV(op)->max_flushed_journal_sector = 0;
@@ -319,9 +336,9 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
        blockstore_journal_check_t space_check(this);
        if (unsynced_big_write_count &&
            !space_check.check_available(op, unsynced_big_write_count,
-                sizeof(journal_entry_big_write) + clean_entry_bitmap_size, 0)
+                sizeof(journal_entry_big_write) + dsk.clean_entry_bitmap_size, 0)
            || !space_check.check_available(op, 1,
-                sizeof(journal_entry_small_write) + clean_entry_bitmap_size, op->len + JOURNAL_STABILIZE_RESERVATION))
+                sizeof(journal_entry_small_write) + dsk.clean_entry_bitmap_size, op->len + JOURNAL_STABILIZE_RESERVATION))
        {
            return 0;
        }
@@ -329,7 +346,7 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
        BS_SUBMIT_CHECK_SQES(
            // Write current journal sector only if it's dirty and full, or in the immediate_commit mode
            (immediate_commit != IMMEDIATE_NONE ||
-                !journal.entry_fits(sizeof(journal_entry_small_write) + clean_entry_bitmap_size) ? 1 : 0) +
+                !journal.entry_fits(sizeof(journal_entry_small_write) + dsk.clean_entry_bitmap_size) ? 1 : 0) +
            (op->len > 0 ? 1 : 0)
        );
        write_iodepth++;
@@ -337,7 +354,7 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
        auto cb = [this, op](ring_data_t *data) { handle_write_event(data, op); };
        if (immediate_commit == IMMEDIATE_NONE)
        {
-            if (!journal.entry_fits(sizeof(journal_entry_small_write) + clean_entry_bitmap_size))
+            if (!journal.entry_fits(sizeof(journal_entry_small_write) + dsk.clean_entry_bitmap_size))
            {
                prepare_journal_sector_write(journal.cur_sector, op);
            }
@@ -349,7 +366,7 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
        // Then pre-fill journal entry
        journal_entry_small_write *je = (journal_entry_small_write*)prefill_single_journal_entry(
            journal, op->opcode == BS_OP_WRITE_STABLE ? JE_SMALL_WRITE_INSTANT : JE_SMALL_WRITE,
-            sizeof(journal_entry_small_write) + clean_entry_bitmap_size
+            sizeof(journal_entry_small_write) + dsk.clean_entry_bitmap_size
        );
        dirty_it->second.journal_sector = journal.sector_info[journal.cur_sector].offset;
        journal.used_sectors[journal.sector_info[journal.cur_sector].offset]++;
@@ -361,14 +378,14 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
        );
 #endif
        // Figure out where data will be
-        journal.next_free = (journal.next_free + op->len) <= journal.len ? journal.next_free : journal_block_size;
+        journal.next_free = (journal.next_free + op->len) <= journal.len ? journal.next_free : dsk.journal_block_size;
        je->oid = op->oid;
        je->version = op->version;
        je->offset = op->offset;
        je->len = op->len;
        je->data_offset = journal.next_free;
        je->crc32_data = crc32c(0, op->buf, op->len);
-        memcpy((void*)(je+1), (clean_entry_bitmap_size > sizeof(void*) ? dirty_it->second.bitmap : &dirty_it->second.bitmap), clean_entry_bitmap_size);
+        memcpy((void*)(je+1), (dsk.clean_entry_bitmap_size > sizeof(void*) ? dirty_it->second.bitmap : &dirty_it->second.bitmap), dsk.clean_entry_bitmap_size);
        je->crc32 = je_crc32((journal_entry*)je);
        journal.crc32_last = je->crc32;
        if (immediate_commit != IMMEDIATE_NONE)
@@ -387,7 +404,7 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
            data2->iov = (struct iovec){ op->buf, op->len };
            data2->callback = cb;
            my_uring_prep_writev(
-                sqe2, journal.fd, &data2->iov, 1, journal.offset + journal.next_free
+                sqe2, dsk.journal_fd, &data2->iov, 1, journal.offset + journal.next_free
            );
            PRIV(op)->pending_ops++;
        }
@@ -400,7 +417,7 @@ int blockstore_impl_t::dequeue_write(blockstore_op_t *op)
        journal.next_free += op->len;
        if (journal.next_free >= journal.len)
        {
-            journal.next_free = journal_block_size;
+            journal.next_free = dsk.journal_block_size;
        }
        if (!PRIV(op)->pending_ops)
        {
@@ -432,6 +449,12 @@ int blockstore_impl_t::continue_write(blockstore_op_t *op)
 resume_2:
    // Only for the immediate_commit mode: prepare and submit big_write journal entry
    {
+        blockstore_journal_check_t space_check(this);
+        if (!space_check.check_available(op, 1,
+            sizeof(journal_entry_big_write) + dsk.clean_entry_bitmap_size, JOURNAL_STABILIZE_RESERVATION))
+        {
+            return 0;
+        }
        BS_SUBMIT_CHECK_SQES(1);
        auto dirty_it = dirty_db.find((obj_ver_id){
            .oid = op->oid,
@@ -440,7 +463,7 @@ resume_2:
        assert(dirty_it != dirty_db.end());
        journal_entry_big_write *je = (journal_entry_big_write*)prefill_single_journal_entry(
            journal, op->opcode == BS_OP_WRITE_STABLE ? JE_BIG_WRITE_INSTANT : JE_BIG_WRITE,
-            sizeof(journal_entry_big_write) + clean_entry_bitmap_size
+            sizeof(journal_entry_big_write) + dsk.clean_entry_bitmap_size
        );
        dirty_it->second.journal_sector = journal.sector_info[journal.cur_sector].offset;
        journal.used_sectors[journal.sector_info[journal.cur_sector].offset]++;
@@ -456,7 +479,7 @@ resume_2:
        je->offset = op->offset;
        je->len = op->len;
        je->location = dirty_it->second.location;
-        memcpy((void*)(je+1), (clean_entry_bitmap_size > sizeof(void*) ? dirty_it->second.bitmap : &dirty_it->second.bitmap), clean_entry_bitmap_size);
+        memcpy((void*)(je+1), (dsk.clean_entry_bitmap_size > sizeof(void*) ? dirty_it->second.bitmap : &dirty_it->second.bitmap), dsk.clean_entry_bitmap_size);
        je->crc32 = je_crc32((journal_entry*)je);
        journal.crc32_last = je->crc32;
        prepare_journal_sector_write(journal.cur_sector, op);
@@ -634,7 +657,7 @@ int blockstore_impl_t::dequeue_del(blockstore_op_t *op)
    // Write current journal sector only if it's dirty and full, or in the immediate_commit mode
    BS_SUBMIT_CHECK_SQES(
        (immediate_commit != IMMEDIATE_NONE ||
-            (journal_block_size - journal.in_sector_pos) < sizeof(journal_entry_del) &&
+            (dsk.journal_block_size - journal.in_sector_pos) < sizeof(journal_entry_del) &&
            journal.sector_info[journal.cur_sector].dirty) ? 1 : 0
    );
    if (write_iodepth >= max_write_iodepth)
@@ -645,7 +668,7 @@ int blockstore_impl_t::dequeue_del(blockstore_op_t *op)
    // Prepare journal sector write
    if (immediate_commit == IMMEDIATE_NONE)
    {
-        if ((journal_block_size - journal.in_sector_pos) < sizeof(journal_entry_del) &&
+        if ((dsk.journal_block_size - journal.in_sector_pos) < sizeof(journal_entry_del) &&
            journal.sector_info[journal.cur_sector].dirty)
        {
            prepare_journal_sector_write(journal.cur_sector, op);
--- a/src/cli.cpp
+++ b/src/cli.cpp
@@ -12,11 +12,81 @@
 #include "epoll_manager.h"
 #include "cluster_client.h"
 #include "pg_states.h"
-#include "base64.h"
+#include "str_util.h"

 static const char *exe_name = NULL;

-static void help();
+static const char* help_text =
+    "Vitastor command-line tool\n"
+    "(c) Vitaliy Filippov, 2019+ (VNPL-1.1)\n"
+    "\n"
+    "COMMANDS:\n"
+    "\n"
+    "vitastor-cli status\n"
+    "  Show cluster status\n"
+    "\n"
+    "vitastor-cli df\n"
+    "  Show pool space statistics\n"
+    "\n"
+    "vitastor-cli ls [-l] [-p POOL] [--sort FIELD] [-r] [-n N] [<glob> ...]\n"
+    "  List images (only matching <glob> patterns if passed).\n"
+    "  -p|--pool POOL  Filter images by pool ID or name\n"
+    "  -l|--long       Also report allocated size and I/O statistics\n"
+    "  --del           Also include delete operation statistics\n"
+    "  --sort FIELD    Sort by specified field (name, size, used_size, <read|write|delete>_<iops|bps|lat|queue>)\n"
+    "  -r|--reverse    Sort in descending order\n"
+    "  -n|--count N    Only list first N items\n"
+    "\n"
+    "vitastor-cli create -s|--size <size> [-p|--pool <id|name>] [--parent <parent_name>[@<snapshot>]] <name>\n"
+    "  Create an image. You may use K/M/G/T suffixes for <size>. If --parent is specified,\n"
+    "  a copy-on-write image clone is created. Parent must be a snapshot (readonly image).\n"
+    "  Pool must be specified if there is more than one pool.\n"
+    "\n"
+    "vitastor-cli create --snapshot <snapshot> [-p|--pool <id|name>] <image>\n"
+    "vitastor-cli snap-create [-p|--pool <id|name>] <image>@<snapshot>\n"
+    "  Create a snapshot of image <name>. May be used live if only a single writer is active.\n"
+    "\n"
+    "vitastor-cli modify <name> [--rename <new-name>] [--resize <size>] [--readonly | --readwrite] [-f|--force]\n"
+    "  Rename, resize image or change its readonly status. Images with children can't be made read-write.\n"
+    "  If the new size is smaller than the old size, extra data will be purged.\n"
+    "  You should resize file system in the image, if present, before shrinking it.\n"
+    "  -f|--force  Proceed with shrinking or setting readwrite flag even if the image has children.\n"
+    "\n"
+    "vitastor-cli rm <from> [<to>] [--writers-stopped]\n"
+    "  Remove <from> or all layers between <from> and <to> (<to> must be a child of <from>),\n"
+    "  rebasing all their children accordingly. --writers-stopped allows merging to be a bit\n"
+    "  more effective in case of a single 'slim' read-write child and 'fat' removed parent:\n"
+    "  the child is merged into parent and parent is renamed to child in that case.\n"
+    "  In other cases parent layers are always merged into children.\n"
+    "\n"
+    "vitastor-cli flatten <layer>\n"
+    "  Flatten a layer, i.e. merge data and detach it from parents.\n"
+    "\n"
+    "vitastor-cli rm-data --pool <pool> --inode <inode> [--wait-list] [--min-offset <offset>]\n"
+    "  Remove inode data without changing metadata.\n"
+    "  --wait-list   Retrieve full objects listings before starting to remove objects.\n"
+    "                Requires more memory, but allows to show correct removal progress.\n"
+    "  --min-offset  Purge only data starting with specified offset.\n"
+    "\n"
+    "vitastor-cli merge-data <from> <to> [--target <target>]\n"
+    "  Merge layer data without changing metadata. Merge <from>..<to> to <target>.\n"
+    "  <to> must be a child of <from> and <target> may be one of the layers between\n"
+    "  <from> and <to>, including <from> and <to>.\n"
+    "\n"
+    "vitastor-cli alloc-osd\n"
+    "  Allocate a new OSD number and reserve it by creating empty /osd/stats/<n> key.\n"
+    "\n"
+    "Use vitastor-cli --help <command> for command details or vitastor-cli --help --all for all details.\n"
+    "\n"
+    "GLOBAL OPTIONS:\n"
+    "  --etcd_address <etcd_address>\n"
+    "  --iodepth N         Send N operations in parallel to each OSD when possible (default 32)\n"
+    "  --parallel_osds M   Work with M osds in parallel when possible (default 4)\n"
+    "  --progress 1|0      Report progress (default 1)\n"
+    "  --cas 1|0           Use CAS writes for flatten, merge, rm (default is decide automatically)\n"
+    "  --no-color          Disable colored output\n"
+    "  --json              JSON output\n"
+;

 static json11::Json::object parse_args(int narg, const char *args[])
 {
@@ -25,9 +95,9 @@ static json11::Json::object parse_args(int narg, const char *args[])
    cfg["progress"] = "1";
    for (int i = 1; i < narg; i++)
    {
-        if (!strcmp(args[i], "-h") || !strcmp(args[i], "--help"))
+        if (args[i][0] == '-' && args[i][1] == 'h')
        {
-            help();
+            cfg["help"] = "1";
        }
        else if (args[i][0] == '-' && args[i][1] == 'l')
        {
@@ -60,6 +130,7 @@ static json11::Json::object parse_args(int narg, const char *args[])
                !strcmp(opt, "long") || !strcmp(opt, "del") || !strcmp(opt, "no-color") ||
                !strcmp(opt, "readonly") || !strcmp(opt, "readwrite") ||
                !strcmp(opt, "force") || !strcmp(opt, "reverse") ||
+                !strcmp(opt, "help") || !strcmp(opt, "all") ||
                !strcmp(opt, "writers-stopped") && strcmp("1", args[i+1]) != 0
                ? "1" : args[++i];
        }
@@ -68,6 +139,10 @@ static json11::Json::object parse_args(int narg, const char *args[])
            cmd.push_back(std::string(args[i]));
        }
    }
+    if (cfg["help"].bool_value())
+    {
+        print_help(help_text, "vitastor-cli", cmd.size() ? cmd[0].string_value() : "", cfg["all"].bool_value());
+    }
    if (!cmd.size())
    {
        std::string exe(exe_name);
@@ -80,94 +155,9 @@ static json11::Json::object parse_args(int narg, const char *args[])
    return cfg;
 }

-static void help()
-{
-    printf(
-        "Vitastor command-line tool\n"
-        "(c) Vitaliy Filippov, 2019+ (VNPL-1.1)\n"
-        "\n"
-        "USAGE:\n"
-        "%s status\n"
-        "  Show cluster status\n"
-        "\n"
-        "%s df\n"
-        "  Show pool space statistics\n"
-        "\n"
-        "%s ls [-l] [-p POOL] [--sort FIELD] [-r] [-n N] [<glob> ...]\n"
-        "  List images (only matching <glob> patterns if passed).\n"
-        "  -p|--pool POOL  Filter images by pool ID or name\n"
-        "  -l|--long       Also report allocated size and I/O statistics\n"
-        "  --del           Also include delete operation statistics\n"
-        "  --sort FIELD    Sort by specified field (name, size, used_size, <read|write|delete>_<iops|bps|lat|queue>)\n"
-        "  -r|--reverse    Sort in descending order\n"
-        "  -n|--count N    Only list first N items\n"
-        "\n"
-        "%s create -s|--size <size> [-p|--pool <id|name>] [--parent <parent_name>[@<snapshot>]] <name>\n"
-        "  Create an image. You may use K/M/G/T suffixes for <size>. If --parent is specified,\n"
-        "  a copy-on-write image clone is created. Parent must be a snapshot (readonly image).\n"
-        "  Pool must be specified if there is more than one pool.\n"
-        "\n"
-        "%s create --snapshot <snapshot> [-p|--pool <id|name>] <image>\n"
-        "%s snap-create [-p|--pool <id|name>] <image>@<snapshot>\n"
-        "  Create a snapshot of image <name>. May be used live if only a single writer is active.\n"
-        "\n"
-        "%s modify <name> [--rename <new-name>] [--resize <size>] [--readonly | --readwrite] [-f|--force]\n"
-        "  Rename, resize image or change its readonly status. Images with children can't be made read-write.\n"
-        "  If the new size is smaller than the old size, extra data will be purged.\n"
-        "  You should resize file system in the image, if present, before shrinking it.\n"
-        "  -f|--force  Proceed with shrinking or setting readwrite flag even if the image has children.\n"
-        "\n"
-        "%s rm <from> [<to>] [--writers-stopped]\n"
-        "  Remove <from> or all layers between <from> and <to> (<to> must be a child of <from>),\n"
-        "  rebasing all their children accordingly. --writers-stopped allows merging to be a bit\n"
-        "  more effective in case of a single 'slim' read-write child and 'fat' removed parent:\n"
-        "  the child is merged into parent and parent is renamed to child in that case.\n"
-        "  In other cases parent layers are always merged into children.\n"
-        "\n"
-        "%s flatten <layer>\n"
-        "  Flatten a layer, i.e. merge data and detach it from parents.\n"
-        "\n"
-        "%s rm-data --pool <pool> --inode <inode> [--wait-list] [--min-offset <offset>]\n"
-        "  Remove inode data without changing metadata.\n"
-        "  --wait-list   Retrieve full objects listings before starting to remove objects.\n"
-        "                Requires more memory, but allows to show correct removal progress.\n"
-        "  --min-offset  Purge only data starting with specified offset.\n"
-        "\n"
-        "%s merge-data <from> <to> [--target <target>]\n"
-        "  Merge layer data without changing metadata. Merge <from>..<to> to <target>.\n"
-        "  <to> must be a child of <from> and <target> may be one of the layers between\n"
-        "  <from> and <to>, including <from> and <to>.\n"
-        "\n"
-        "%s alloc-osd\n"
-        "  Allocate a new OSD number and reserve it by creating empty /osd/stats/<n> key.\n"
-        "%s simple-offsets <device>\n"
-        "  Calculate offsets for simple&stupid (no superblock) OSD deployment. Options:\n"
-        "  --object_size 128k       Set blockstore block size\n"
-        "  --bitmap_granularity 4k  Set bitmap granularity\n"
-        "  --journal_size 16M       Set journal size\n"
-        "  --device_block_size 4k   Set device block size\n"
-        "  --journal_offset 0       Set journal offset\n"
-        "  --device_size 0          Set device size\n"
-        "  --format text            Result format: json, options, env, or text\n"
-        "\n"
-        "GLOBAL OPTIONS:\n"
-        "  --etcd_address <etcd_address>\n"
-        "  --iodepth N         Send N operations in parallel to each OSD when possible (default 32)\n"
-        "  --parallel_osds M   Work with M osds in parallel when possible (default 4)\n"
-        "  --progress 1|0      Report progress (default 1)\n"
-        "  --cas 1|0           Use CAS writes for flatten, merge, rm (default is decide automatically)\n"
-        "  --no-color          Disable colored output\n"
-        "  --json              JSON output\n"
-        ,
-        exe_name, exe_name, exe_name, exe_name, exe_name, exe_name, exe_name,
-        exe_name, exe_name, exe_name, exe_name, exe_name, exe_name
-    );
-    exit(0);
-}
-
 static int run(cli_tool_t *p, json11::Json::object cfg)
 {
-    cli_result_t result;
+    cli_result_t result = {};
    p->parse_config(cfg);
    json11::Json::array cmd = cfg["command"].array_items();
    cfg.erase("command");
@@ -271,15 +261,6 @@ static int run(cli_tool_t *p, json11::Json::object cfg)
        // Allocate a new OSD number
        action_cb = p->start_alloc_osd(cfg);
    }
-    else if (cmd[0] == "simple-offsets")
-    {
-        // Calculate offsets for simple & stupid OSD deployment without superblock
-        if (cmd.size() > 1)
-        {
-            cfg["device"] = cmd[1];
-        }
-        action_cb = p->simple_offsets(cfg);
-    }
    else
    {
        result = { .err = EINVAL, .text = "unknown command: "+cmd[0].string_value() };
--- a/src/cli.h
+++ b/src/cli.h
@@ -65,7 +65,6 @@ public:
    std::function<bool(cli_result_t &)> start_flatten(json11::Json);
    std::function<bool(cli_result_t &)> start_rm(json11::Json);
    std::function<bool(cli_result_t &)> start_alloc_osd(json11::Json cfg);
-    std::function<bool(cli_result_t &)> simple_offsets(json11::Json cfg);

    // Should be called like loop_and_wait(start_status(), <completion callback>)
    void loop_and_wait(std::function<bool(cli_result_t &)> loop_cb, std::function<void(const cli_result_t &)> complete_cb);
@@ -73,12 +72,8 @@ public:
    void etcd_txn(json11::Json txn);
 };

-uint64_t parse_size(std::string size_str);
-
 std::string print_table(json11::Json items, json11::Json header, bool use_esc);

-std::string format_size(uint64_t size, bool nobytes = false);
-
 std::string format_lat(uint64_t lat);

 std::string format_q(double depth);
--- a/src/cli_alloc_osd.cpp
+++ b/src/cli_alloc_osd.cpp
@@ -4,7 +4,7 @@
 #include <ctype.h>
 #include "cli.h"
 #include "cluster_client.h"
-#include "base64.h"
+#include "str_util.h"

 #include <algorithm>

--- a/src/cli_common.cpp
+++ b/src/cli_common.cpp
@@ -1,7 +1,7 @@
 // Copyright (c) Vitaliy Filippov, 2019+
 // License: VNPL-1.1 (see README.md for details)

-#include "base64.h"
+#include "str_util.h"
 #include "cluster_client.h"
 #include "cli.h"

--- a/src/cli_create.cpp
+++ b/src/cli_create.cpp
@@ -4,7 +4,7 @@
 #include <ctype.h>
 #include "cli.h"
 #include "cluster_client.h"
-#include "base64.h"
+#include "str_util.h"

 // Create an image, snapshot or clone
 //
@@ -276,7 +276,8 @@ resume_4:
            new_id = 1+INODE_NO_POOL(kv.value.uint64_value());
            max_id_mod_rev = kv.mod_revision;
        }
-        auto ino_it = parent->cli->st_cli.inode_config.lower_bound(INODE_WITH_POOL(new_pool_id, 0));
+        // Also check existing inodes - for the case when some inodes are created without changing /index/maxid
+        auto ino_it = parent->cli->st_cli.inode_config.lower_bound(INODE_WITH_POOL(new_pool_id+1, 0));
        if (ino_it != parent->cli->st_cli.inode_config.begin())
        {
            ino_it--;
@@ -506,34 +507,6 @@ resume_3:
    }
 };

-uint64_t parse_size(std::string size_str)
-{
-    if (!size_str.length())
-    {
-        return 0;
-    }
-    uint64_t mul = 1;
-    char type_char = tolower(size_str[size_str.length()-1]);
-    if (type_char == 'k' || type_char == 'm' || type_char == 'g' || type_char == 't')
-    {
-        if (type_char == 'k')
-            mul = (uint64_t)1<<10;
-        else if (type_char == 'm')
-            mul = (uint64_t)1<<20;
-        else if (type_char == 'g')
-            mul = (uint64_t)1<<30;
-        else /*if (type_char == 't')*/
-            mul = (uint64_t)1<<40;
-        size_str = size_str.substr(0, size_str.length()-1);
-    }
-    uint64_t size = json11::Json(size_str).uint64_value() * mul;
-    if (size == 0 && size_str != "0" && (size_str != "" || mul != 1))
-    {
-        return UINT64_MAX;
-    }
-    return size;
-}
-
 std::function<bool(cli_result_t &)> cli_tool_t::start_create(json11::Json cfg)
 {
    auto image_creator = new image_creator_t();
@@ -553,8 +526,9 @@ std::function<bool(cli_result_t &)> cli_tool_t::start_create(json11::Json cfg)
    image_creator->new_parent = cfg["parent"].string_value();
    if (cfg["size"].string_value() != "")
    {
-        image_creator->size = parse_size(cfg["size"].string_value());
-        if (image_creator->size == UINT64_MAX)
+        bool ok;
+        image_creator->size = parse_size(cfg["size"].string_value(), &ok);
+        if (!ok)
        {
            return [size = cfg["size"].string_value()](cli_result_t & result)
            {
--- a/src/cli_df.cpp
+++ b/src/cli_df.cpp
@@ -3,7 +3,7 @@

 #include "cli.h"
 #include "cluster_client.h"
-#include "base64.h"
+#include "str_util.h"

 // List pools with space statistics
 struct pool_lister_t
--- a/src/cli_ls.cpp
+++ b/src/cli_ls.cpp
@@ -4,7 +4,7 @@
 #include <algorithm>
 #include "cli.h"
 #include "cluster_client.h"
-#include "base64.h"
+#include "str_util.h"

 // List existing images
 //
@@ -446,33 +446,6 @@ std::string print_table(json11::Json items, json11::Json header, bool use_esc)
    return str;
 }

-static uint64_t size_thresh[] = { (uint64_t)1024*1024*1024*1024, (uint64_t)1024*1024*1024, (uint64_t)1024*1024, 1024, 0 };
-static uint64_t size_thresh_d[] = { (uint64_t)1000000000000, (uint64_t)1000000000, (uint64_t)1000000, (uint64_t)1000, 0 };
-static const int size_thresh_n = sizeof(size_thresh)/sizeof(size_thresh[0]);
-static const char *size_unit = "TGMKB";
-
-std::string format_size(uint64_t size, bool nobytes)
-{
-    uint64_t *thr = nobytes ? size_thresh_d : size_thresh;
-    char buf[256];
-    for (int i = 0; i < size_thresh_n; i++)
-    {
-        if (size >= thr[i] || i >= size_thresh_n-1)
-        {
-            double value = thr[i] ? (double)size/thr[i] : size;
-            int l = snprintf(buf, sizeof(buf), "%.1f", value);
-            assert(l < sizeof(buf)-2);
-            if (buf[l-1] == '0')
-                l -= 2;
-            buf[l] = i == size_thresh_n-1 && nobytes ? 0 : ' ';
-            buf[l+1] = i == size_thresh_n-1 && nobytes ? 0 : size_unit[i];
-            buf[l+2] = 0;
-            break;
-        }
-    }
-    return std::string(buf);
-}
-
 std::string format_lat(uint64_t lat)
 {
    char buf[256];
--- a/src/cli_merge.cpp
+++ b/src/cli_merge.cpp
@@ -47,6 +47,7 @@ struct snap_merger_t
    int state = 0;
    int lists_todo = 0;
    uint64_t target_block_size = 0;
+    uint32_t target_bitmap_granularity = 0;
    btree::safe_btree_set<uint64_t> merge_offsets;
    btree::safe_btree_set<uint64_t>::iterator oit;
    std::map<inode_t, std::vector<uint64_t>> layer_lists;
@@ -101,7 +102,7 @@ struct snap_merger_t
        std::vector<inode_t> chain_list;
        inode_config_t *cur = to_cfg;
        chain_list.push_back(cur->num);
-        layer_block_size[cur->num] = get_block_size(cur->num);
+        layer_block_size[cur->num] = get_block_size(cur->num, NULL);
        while (cur->parent_id != from_cfg->num &&
            cur->parent_id != to_cfg->num &&
            cur->parent_id != 0)
@@ -124,7 +125,7 @@ struct snap_merger_t
            }
            cur = &it->second;
            chain_list.push_back(cur->num);
-            layer_block_size[cur->num] = get_block_size(cur->num);
+            layer_block_size[cur->num] = get_block_size(cur->num, NULL);
        }
        if (cur->parent_id != from_cfg->num)
        {
@@ -133,7 +134,7 @@ struct snap_merger_t
            return;
        }
        chain_list.push_back(from_cfg->num);
-        layer_block_size[from_cfg->num] = get_block_size(from_cfg->num);
+        layer_block_size[from_cfg->num] = get_block_size(from_cfg->num, NULL);
        int i = chain_list.size()-1;
        for (inode_t item: chain_list)
        {
@@ -204,14 +205,16 @@ struct snap_merger_t
                use_cas ? " online (with CAS)" : "", INODE_NO_POOL(target), INODE_POOL(target)
            );
        }
-        target_block_size = get_block_size(target);
+        target_block_size = get_block_size(target, &target_bitmap_granularity);
    }

-    uint64_t get_block_size(inode_t inode)
+    uint64_t get_block_size(inode_t inode, uint32_t *bitmap_granularity)
    {
        auto & pool_cfg = parent->cli->st_cli.pool_config.at(INODE_POOL(inode));
        uint64_t pg_data_size = (pool_cfg.scheme == POOL_SCHEME_REPLICATED ? 1 : pool_cfg.pg_size-pool_cfg.parity_chunks);
-        return parent->cli->get_bs_block_size() * pg_data_size;
+        if (bitmap_granularity)
+            *bitmap_granularity = pool_cfg.bitmap_granularity;
+        return pool_cfg.data_block_size * pg_data_size;
    }

    void continue_merge_reent()
@@ -409,7 +412,7 @@ struct snap_merger_t
            }
            else
            {
-                uint64_t bitmap_bytes = target_block_size/parent->cli->get_bs_bitmap_granularity()/8;
+                uint64_t bitmap_bytes = target_block_size/target_bitmap_granularity/8;
                int i;
                for (i = 0; i < bitmap_bytes; i++)
                {
@@ -469,7 +472,7 @@ struct snap_merger_t
    {
        // Write each non-empty range using an individual operation
        // FIXME: Allow to use single write with "holes" (OSDs don't allow it yet)
-        uint32_t gran = parent->cli->get_bs_bitmap_granularity();
+        uint32_t gran = target_bitmap_granularity;
        uint64_t bitmap_size = target_block_size / gran;
        while (rwo->end < bitmap_size && !rwo->error_code)
        {
--- a/src/cli_modify.cpp
+++ b/src/cli_modify.cpp
@@ -3,7 +3,7 @@

 #include "cli.h"
 #include "cluster_client.h"
-#include "base64.h"
+#include "str_util.h"

 // Rename, resize image (and purge extra data on shrink) or change its readonly status
 struct image_changer_t
--- a/src/cli_rm.cpp
+++ b/src/cli_rm.cpp
@@ -4,7 +4,7 @@
 #include <fcntl.h>
 #include "cli.h"
 #include "cluster_client.h"
-#include "base64.h"
+#include "str_util.h"

 // Remove layer(s): similar to merge, but alters metadata and processes multiple merge targets
 //
--- a/src/cli_status.cpp
+++ b/src/cli_status.cpp
@@ -3,10 +3,12 @@

 #include "cli.h"
 #include "cluster_client.h"
-#include "base64.h"
+#include "str_util.h"
 #include "pg_states.h"
 #include "http_client.h"

+static const char *obj_states[] = { "clean", "misplaced", "degraded", "incomplete" };
+
 // Print cluster status:
 // etcd, mon, osd states
 // raw/used space, object states, pool states, pg states
@@ -196,21 +198,57 @@ resume_2:
            }
            pgs_by_state_str += std::to_string(kv.second)+" "+kv.first;
        }
-        uint64_t object_size = parent->cli->get_bs_block_size();
-        std::string more_states;
-        uint64_t obj_n;
-        obj_n = agg_stats["object_counts"]["misplaced"].uint64_value();
-        if (obj_n > 0)
-            more_states += ", "+format_size(obj_n*object_size)+" misplaced";
-        obj_n = agg_stats["object_counts"]["degraded"].uint64_value();
-        if (obj_n > 0)
-            more_states += ", "+format_size(obj_n*object_size)+" degraded";
-        obj_n = agg_stats["object_counts"]["incomplete"].uint64_value();
-        if (obj_n > 0)
-            more_states += ", "+format_size(obj_n*object_size)+" incomplete";
        bool readonly = json_is_true(parent->cli->merged_config["readonly"]);
        bool no_recovery = json_is_true(parent->cli->merged_config["no_recovery"]);
        bool no_rebalance = json_is_true(parent->cli->merged_config["no_rebalance"]);
+        if (parent->json_output)
+        {
+            // JSON output
+            auto json_status = json11::Json::object {
+                { "etcd_alive", etcd_alive },
+                { "etcd_count", (uint64_t)etcd_states.size() },
+                { "etcd_db_size", etcd_db_size },
+                { "mon_count", mon_count },
+                { "mon_master", mon_master },
+                { "osd_up", osd_up },
+                { "osd_count", osd_count },
+                { "total_raw", total_raw },
+                { "free_raw", free_raw },
+                { "down_raw", down_raw },
+                { "free_down_raw", free_down_raw },
+                { "readonly", readonly },
+                { "no_recovery", no_recovery },
+                { "no_rebalance", no_rebalance },
+                { "pool_count", pool_count },
+                { "active_pool_count", pools_active },
+                { "pg_states", pgs_by_state },
+                { "op_stats", agg_stats["op_stats"] },
+                { "recovery_stats", agg_stats["recovery_stats"] },
+                { "object_counts", agg_stats["object_counts"] },
+            };
+            for (int i = 0; i < sizeof(obj_states)/sizeof(obj_states[0]); i++)
+            {
+                std::string str(obj_states[i]);
+                uint64_t obj_n = agg_stats["object_bytes"][str].uint64_value();
+                if (!obj_n)
+                    obj_n = agg_stats["object_counts"][str].uint64_value() * parent->cli->st_cli.global_block_size;
+                json_status[str+"_data"] = obj_n;
+            }
+            printf("%s\n", json11::Json(json_status).dump().c_str());
+            state = 100;
+            return;
+        }
+        std::string more_states;
+        for (int i = 0; i < sizeof(obj_states)/sizeof(obj_states[0]); i++)
+        {
+            std::string str(obj_states[i]);
+            uint64_t obj_n = agg_stats["object_bytes"][str].uint64_value();
+            if (!obj_n)
+                obj_n = agg_stats["object_counts"][str].uint64_value() * parent->cli->st_cli.global_block_size;
+            if (!i || obj_n > 0)
+                more_states += format_size(obj_n)+" "+str+", ";
+        }
+        more_states.resize(more_states.size()-2);
        std::string recovery_io;
        {
            uint64_t deg_bps = agg_stats["recovery_stats"]["degraded"]["bps"].uint64_value();
@@ -232,38 +270,6 @@ resume_2:
            else if (no_rebalance)
                recovery_io += "    rebalance: disabled\n";
        }
-        if (parent->json_output)
-        {
-            // JSON output
-            printf("%s\n", json11::Json(json11::Json::object {
-                { "etcd_alive", etcd_alive },
-                { "etcd_count", (uint64_t)etcd_states.size() },
-                { "etcd_db_size", etcd_db_size },
-                { "mon_count", mon_count },
-                { "mon_master", mon_master },
-                { "osd_up", osd_up },
-                { "osd_count", osd_count },
-                { "total_raw", total_raw },
-                { "free_raw", free_raw },
-                { "down_raw", down_raw },
-                { "free_down_raw", free_down_raw },
-                { "readonly", readonly },
-                { "no_recovery", no_recovery },
-                { "no_rebalance", no_rebalance },
-                { "clean_data", agg_stats["object_counts"]["clean"].uint64_value() * object_size },
-                { "misplaced_data", agg_stats["object_counts"]["misplaced"].uint64_value() * object_size },
-                { "degraded_data", agg_stats["object_counts"]["degraded"].uint64_value() * object_size },
-                { "incomplete_data", agg_stats["object_counts"]["incomplete"].uint64_value() * object_size },
-                { "pool_count", pool_count },
-                { "active_pool_count", pools_active },
-                { "pg_states", pgs_by_state },
-                { "op_stats", agg_stats["op_stats"] },
-                { "recovery_stats", agg_stats["recovery_stats"] },
-                { "object_counts", agg_stats["object_counts"] },
-            }).dump().c_str());
-            state = 100;
-            return;
-        }
        printf(
            "  cluster:\n"
            "    etcd: %d / %ld up, %s database size\n"
@@ -272,7 +278,7 @@ resume_2:
            "  \n"
            "  data:\n"
            "    raw:   %s used, %s / %s available%s\n"
-            "    state: %s clean%s\n"
+            "    state: %s\n"
            "    pools: %d / %d active\n"
            "    pgs:   %s\n"
            "  \n"
@@ -286,7 +292,7 @@ resume_2:
            format_size(free_raw-free_down_raw).c_str(),
            format_size(total_raw-down_raw).c_str(),
            (down_raw > 0 ? (", "+format_size(down_raw)+" down").c_str() : ""),
-            format_size(agg_stats["object_counts"]["clean"].uint64_value() * object_size).c_str(), more_states.c_str(),
+            more_states.c_str(),
            pools_active, pool_count,
            pgs_by_state_str.c_str(),
            readonly ? " (read-only mode)" : "",
--- a/src/cluster_client.cpp
+++ b/src/cluster_client.cpp
@@ -14,6 +14,7 @@
 #define CACHE_FLUSHING 2
 #define CACHE_REPEATING 3
 #define OP_FLUSH_BUFFER 0x02
+#define OP_IMMEDIATE_COMMIT 0x04

 cluster_client_t::cluster_client_t(ring_loop_t *ringloop, timerfd_manager_t *tfd, json11::Json & config)
 {
@@ -127,26 +128,26 @@ void cluster_client_t::calc_wait(cluster_op_t *op)
                op->prev_wait++;
            }
        }
-        if (!op->prev_wait && pgs_loaded)
+        if (!op->prev_wait)
            continue_rw(op);
    }
    else if (op->opcode == OSD_OP_SYNC)
    {
        for (auto prev = op->prev; prev; prev = prev->prev)
        {
-            if (prev->opcode == OSD_OP_SYNC || prev->opcode == OSD_OP_WRITE)
+            if (prev->opcode == OSD_OP_SYNC || prev->opcode == OSD_OP_WRITE && !(prev->flags & OP_IMMEDIATE_COMMIT))
            {
                op->prev_wait++;
            }
        }
-        if (!op->prev_wait && pgs_loaded)
+        if (!op->prev_wait)
            continue_sync(op);
    }
    else /* if (op->opcode == OSD_OP_READ || op->opcode == OSD_OP_READ_BITMAP) */
    {
        for (auto prev = op_queue_head; prev && prev != op; prev = prev->next)
        {
-            if (prev->opcode == OSD_OP_WRITE && prev->flags & OP_FLUSH_BUFFER)
+            if (prev->opcode == OSD_OP_WRITE && (prev->flags & OP_FLUSH_BUFFER))
            {
                op->prev_wait++;
            }
@@ -156,7 +157,7 @@ void cluster_client_t::calc_wait(cluster_op_t *op)
                break;
            }
        }
-        if (!op->prev_wait && pgs_loaded)
+        if (!op->prev_wait)
            continue_rw(op);
    }
 }
@@ -168,7 +169,7 @@ void cluster_client_t::inc_wait(uint64_t opcode, uint64_t flags, cluster_op_t *n
        while (next)
        {
            auto n2 = next->next;
-            if (next->opcode == OSD_OP_SYNC ||
+            if (next->opcode == OSD_OP_SYNC && !(flags & OP_IMMEDIATE_COMMIT) ||
                next->opcode == OSD_OP_WRITE && (flags & OP_FLUSH_BUFFER) && !(next->flags & OP_FLUSH_BUFFER) ||
                (next->opcode == OSD_OP_READ || next->opcode == OSD_OP_READ_BITMAP) && (flags & OP_FLUSH_BUFFER))
            {
@@ -220,9 +221,11 @@ void cluster_client_t::erase_op(cluster_op_t *op)
    if (op_queue_tail == op)
        op_queue_tail = op->prev;
    op->next = op->prev = NULL;
-    std::function<void(cluster_op_t*)>(op->callback)(op);
-    if (!immediate_commit)
+    if (!(flags & OP_IMMEDIATE_COMMIT))
        inc_wait(opcode, flags, next, -1);
+    // Call callback at the end to avoid inconsistencies in prev_wait
+    // if the callback adds more operations itself
+    std::function<void(cluster_op_t*)>(op->callback)(op);
 }

 void cluster_client_t::continue_ops(bool up_retry)
@@ -262,21 +265,6 @@ restart:
    continuing_ops = 0;
 }

-static uint32_t is_power_of_two(uint64_t value)
-{
-    uint32_t l = 0;
-    while (value > 1)
-    {
-        if (value & 1)
-        {
-            return 64;
-        }
-        value = value >> 1;
-        l++;
-    }
-    return l;
-}
-
 void cluster_client_t::on_load_config_hook(json11::Json::object & config)
 {
    this->merged_config = config;
@@ -284,24 +272,6 @@ void cluster_client_t::on_load_config_hook(json11::Json::object & config)
    {
        this->merged_config[kv.first] = kv.second;
    }
-    bs_block_size = config["block_size"].uint64_value();
-    bs_bitmap_granularity = config["bitmap_granularity"].uint64_value();
-    if (!bs_block_size)
-    {
-        bs_block_size = DEFAULT_BLOCK_SIZE;
-    }
-    if (!bs_bitmap_granularity)
-    {
-        bs_bitmap_granularity = DEFAULT_BITMAP_GRANULARITY;
-    }
-    bs_bitmap_size = bs_block_size / bs_bitmap_granularity / 8;
-    uint32_t block_order;
-    if ((block_order = is_power_of_two(bs_block_size)) >= 64 || bs_block_size < MIN_BLOCK_SIZE || bs_block_size >= MAX_BLOCK_SIZE)
-    {
-        throw std::runtime_error("Bad block size");
-    }
-    // Cluster-wide immediate_commit mode
-    immediate_commit = (config["immediate_commit"] == "all");
    if (config.find("client_max_dirty_bytes") != config.end())
    {
        client_max_dirty_bytes = config["client_max_dirty_bytes"].uint64_value();
@@ -379,9 +349,15 @@ void cluster_client_t::on_change_hook(std::map<std::string, etcd_kv_t> & changes
    continue_ops();
 }

-bool cluster_client_t::get_immediate_commit()
+bool cluster_client_t::get_immediate_commit(uint64_t inode)
 {
-    return immediate_commit;
+    pool_id_t pool_id = INODE_POOL(inode);
+    if (!pool_id)
+        return true;
+    auto pool_it = st_cli.pool_config.find(pool_id);
+    if (pool_it == st_cli.pool_config.end())
+        return true;
+    return pool_it->second.immediate_commit == IMMEDIATE_ALL;
 }

 void cluster_client_t::on_change_osd_state_hook(uint64_t peer_osd)
@@ -439,9 +415,45 @@ void cluster_client_t::execute(cluster_op_t *op)
        std::function<void(cluster_op_t*)>(op->callback)(op);
        return;
    }
+    if (!pgs_loaded)
+    {
+        offline_ops.push_back(op);
+        return;
+    }
    op->cur_inode = op->inode;
    op->retval = 0;
-    if (op->opcode == OSD_OP_WRITE && !immediate_commit)
+    op->flags = op->flags & OSD_OP_IGNORE_READONLY; // single allowed flag
+    if (op->opcode != OSD_OP_SYNC)
+    {
+        pool_id_t pool_id = INODE_POOL(op->cur_inode);
+        if (!pool_id)
+        {
+            op->retval = -EINVAL;
+            std::function<void(cluster_op_t*)>(op->callback)(op);
+            return;
+        }
+        auto pool_it = st_cli.pool_config.find(pool_id);
+        if (pool_it == st_cli.pool_config.end() || pool_it->second.real_pg_count == 0)
+        {
+            // Pools are loaded, but this one is unknown
+            op->retval = -EINVAL;
+            std::function<void(cluster_op_t*)>(op->callback)(op);
+            return;
+        }
+        // Check alignment
+        if ((op->opcode == OSD_OP_READ || op->opcode == OSD_OP_WRITE) && !op->len ||
+            op->offset % pool_it->second.bitmap_granularity || op->len % pool_it->second.bitmap_granularity)
+        {
+            op->retval = -EINVAL;
+            std::function<void(cluster_op_t*)>(op->callback)(op);
+            return;
+        }
+        if (pool_it->second.immediate_commit == IMMEDIATE_ALL)
+        {
+            op->flags |= OP_IMMEDIATE_COMMIT;
+        }
+    }
+    if (op->opcode == OSD_OP_WRITE && !(op->flags & OP_IMMEDIATE_COMMIT))
    {
        if (dirty_bytes >= client_max_dirty_bytes || dirty_ops >= client_max_dirty_ops)
        {
@@ -480,9 +492,9 @@ void cluster_client_t::execute(cluster_op_t *op)
    }
    else
        op_queue_tail = op_queue_head = op;
-    if (!immediate_commit)
+    if (!(op->flags & OP_IMMEDIATE_COMMIT))
        calc_wait(op);
-    else if (pgs_loaded)
+    else
    {
        if (op->opcode == OSD_OP_SYNC)
            continue_sync(op);
@@ -610,28 +622,6 @@ int cluster_client_t::continue_rw(cluster_op_t *op)
    else if (op->state == 3)
        goto resume_3;
 resume_0:
-    if ((op->opcode == OSD_OP_READ || op->opcode == OSD_OP_WRITE) && !op->len ||
-        op->offset % bs_bitmap_granularity || op->len % bs_bitmap_granularity)
-    {
-        op->retval = -EINVAL;
-        erase_op(op);
-        return 1;
-    }
-    {
-        pool_id_t pool_id = INODE_POOL(op->cur_inode);
-        if (!pool_id)
-        {
-            op->retval = -EINVAL;
-            erase_op(op);
-            return 1;
-        }
-        if (st_cli.pool_config.find(pool_id) == st_cli.pool_config.end() ||
-            st_cli.pool_config[pool_id].real_pg_count == 0)
-        {
-            // Postpone operations to unknown pools
-            return 0;
-        }
-    }
    if (op->opcode == OSD_OP_WRITE || op->opcode == OSD_OP_DELETE)
    {
        if (!(op->flags & OSD_OP_IGNORE_READONLY))
@@ -644,7 +634,7 @@ resume_0:
                return 1;
            }
        }
-        if (op->opcode == OSD_OP_WRITE && !immediate_commit && !(op->flags & OP_FLUSH_BUFFER))
+        if (op->opcode == OSD_OP_WRITE && !(op->flags & OP_IMMEDIATE_COMMIT) && !(op->flags & OP_FLUSH_BUFFER))
        {
            copy_write(op, dirty_buffers);
        }
@@ -814,7 +804,7 @@ void cluster_client_t::slice_rw(cluster_op_t *op)
    // Primary OSDs still operate individual stripes, but their size is multiplied by PG minsize in case of EC
    auto & pool_cfg = st_cli.pool_config.at(INODE_POOL(op->cur_inode));
    uint32_t pg_data_size = (pool_cfg.scheme == POOL_SCHEME_REPLICATED ? 1 : pool_cfg.pg_size-pool_cfg.parity_chunks);
-    uint64_t pg_block_size = bs_block_size * pg_data_size;
+    uint64_t pg_block_size = pool_cfg.data_block_size * pg_data_size;
    uint64_t first_stripe = (op->offset / pg_block_size) * pg_block_size;
    uint64_t last_stripe = op->len > 0 ? ((op->offset + op->len - 1) / pg_block_size) * pg_block_size : first_stripe;
    op->retval = 0;
@@ -822,9 +812,9 @@ void cluster_client_t::slice_rw(cluster_op_t *op)
    if (op->opcode == OSD_OP_READ || op->opcode == OSD_OP_READ_BITMAP)
    {
        // Allocate memory for the bitmap
-        unsigned object_bitmap_size = (((op->opcode == OSD_OP_READ_BITMAP ? pg_block_size : op->len) / bs_bitmap_granularity + 7) / 8);
+        unsigned object_bitmap_size = (((op->opcode == OSD_OP_READ_BITMAP ? pg_block_size : op->len) / pool_cfg.bitmap_granularity + 7) / 8);
        object_bitmap_size = (object_bitmap_size < 8 ? 8 : object_bitmap_size);
-        unsigned bitmap_mem = object_bitmap_size + (bs_bitmap_size * pg_data_size) * op->parts.size();
+        unsigned bitmap_mem = object_bitmap_size + (pool_cfg.data_block_size / pool_cfg.bitmap_granularity / 8 * pg_data_size) * op->parts.size();
        if (op->bitmap_buf_size < bitmap_mem)
        {
            op->bitmap_buf = realloc_or_die(op->bitmap_buf, bitmap_mem);
@@ -854,7 +844,7 @@ void cluster_client_t::slice_rw(cluster_op_t *op)
            bool skip_prev = true;
            while (cur < end)
            {
-                unsigned bmp_loc = (cur - op->offset)/bs_bitmap_granularity;
+                unsigned bmp_loc = (cur - op->offset)/pool_cfg.bitmap_granularity;
                bool skip = (((*((uint8_t*)op->bitmap_buf + bmp_loc/8)) >> (bmp_loc%8)) & 0x1);
                if (skip_prev != skip)
                {
@@ -872,7 +862,7 @@ void cluster_client_t::slice_rw(cluster_op_t *op)
                    skip_prev = skip;
                    prev = cur;
                }
-                cur += bs_bitmap_granularity;
+                cur += pool_cfg.bitmap_granularity;
            }
            assert(cur > prev);
            if (skip_prev)
@@ -904,7 +894,7 @@ bool cluster_client_t::affects_osd(uint64_t inode, uint64_t offset, uint64_t len
 {
    auto & pool_cfg = st_cli.pool_config.at(INODE_POOL(inode));
    uint32_t pg_data_size = (pool_cfg.scheme == POOL_SCHEME_REPLICATED ? 1 : pool_cfg.pg_size-pool_cfg.parity_chunks);
-    uint64_t pg_block_size = bs_block_size * pg_data_size;
+    uint64_t pg_block_size = pool_cfg.data_block_size * pg_data_size;
    uint64_t first_stripe = (offset / pg_block_size) * pg_block_size;
    uint64_t last_stripe = len > 0 ? ((offset + len - 1) / pg_block_size) * pg_block_size : first_stripe;
    for (uint64_t stripe = first_stripe; stripe <= last_stripe; stripe += pg_block_size)
@@ -935,7 +925,7 @@ bool cluster_client_t::try_send(cluster_op_t *op, int i)
            part->osd_num = primary_osd;
            part->flags |= PART_SENT;
            op->inflight_count++;
-            uint64_t pg_bitmap_size = bs_bitmap_size * (
+            uint64_t pg_bitmap_size = (pool_cfg.data_block_size / pool_cfg.bitmap_granularity / 8) * (
                pool_cfg.scheme == POOL_SCHEME_REPLICATED ? 1 : pool_cfg.pg_size-pool_cfg.parity_chunks
            );
            uint64_t meta_rev = 0;
@@ -983,7 +973,7 @@ int cluster_client_t::continue_sync(cluster_op_t *op)
 {
    if (op->state == 1)
        goto resume_1;
-    if (immediate_commit || !dirty_osds.size())
+    if (!dirty_osds.size())
    {
        // Sync is not required in the immediate_commit mode or if there are no dirty_osds
        op->retval = 0;
@@ -1140,7 +1130,8 @@ void cluster_client_t::handle_op_part(cluster_op_part_t *part)
    else
    {
        // OK
-        dirty_osds.insert(part->osd_num);
+        if (!(op->flags & OP_IMMEDIATE_COMMIT))
+            dirty_osds.insert(part->osd_num);
        part->flags |= PART_DONE;
        op->done_count++;
        if (op->opcode == OSD_OP_READ || op->opcode == OSD_OP_READ_BITMAP)
@@ -1162,12 +1153,12 @@ void cluster_client_t::copy_part_bitmap(cluster_op_t *op, cluster_op_part_t *par
 {
    // Copy (OR) bitmap
    auto & pool_cfg = st_cli.pool_config.at(INODE_POOL(op->cur_inode));
-    uint32_t pg_block_size = bs_block_size * (
+    uint32_t pg_block_size = pool_cfg.data_block_size * (
        pool_cfg.scheme == POOL_SCHEME_REPLICATED ? 1 : pool_cfg.pg_size-pool_cfg.parity_chunks
    );
-    uint32_t object_offset = (part->op.req.rw.offset - op->offset) / bs_bitmap_granularity;
-    uint32_t part_offset = (part->op.req.rw.offset % pg_block_size) / bs_bitmap_granularity;
-    uint32_t part_len = (op->opcode == OSD_OP_READ_BITMAP ? pg_block_size : part->op.req.rw.len) / bs_bitmap_granularity;
+    uint32_t object_offset = (part->op.req.rw.offset - op->offset) / pool_cfg.bitmap_granularity;
+    uint32_t part_offset = (part->op.req.rw.offset % pg_block_size) / pool_cfg.bitmap_granularity;
+    uint32_t part_len = (op->opcode == OSD_OP_READ_BITMAP ? pg_block_size : part->op.req.rw.len) / pool_cfg.bitmap_granularity;
    if (!(object_offset & 0x7) && !(part_offset & 0x7) && (part_len >= 8))
    {
        // Copy bytes
--- a/src/cluster_client.h
+++ b/src/cluster_client.h
@@ -6,8 +6,6 @@
 #include "messenger.h"
 #include "etcd_state_client.h"

-#define MIN_BLOCK_SIZE 4*1024
-#define MAX_BLOCK_SIZE 128*1024*1024
 #define DEFAULT_CLIENT_MAX_DIRTY_BYTES 32*1024*1024
 #define DEFAULT_CLIENT_MAX_DIRTY_OPS 1024
 #define INODE_LIST_DONE 1
@@ -79,11 +77,7 @@ class cluster_client_t
    timerfd_manager_t *tfd;
    ring_loop_t *ringloop;

-    uint64_t bs_block_size = 0;
-    uint32_t bs_bitmap_granularity = 0, bs_bitmap_size = 0;
    std::map<pool_id_t, uint64_t> pg_counts;
-    // WARNING: initially true so execute() doesn't create fake sync
-    bool immediate_commit = true;
    // FIXME: Implement inmemory_commit mode. Note that it requires to return overlapping reads from memory.
    uint64_t client_max_dirty_bytes = 0;
    uint64_t client_max_dirty_ops = 0;
@@ -119,7 +113,7 @@ public:
    bool is_ready();
    void on_ready(std::function<void(void)> fn);

-    bool get_immediate_commit();
+    bool get_immediate_commit(uint64_t inode);

    static void copy_write(cluster_op_t *op, std::map<object_id, cluster_buffer_t> & dirty_buffers);
    void continue_ops(bool up_retry = false);
@@ -127,8 +121,8 @@ public:
        std::function<void(inode_list_t* lst, std::set<object_id>&& objects, pg_num_t pg_num, osd_num_t primary_osd, int status)> callback);
    int list_pg_count(inode_list_t *lst);
    void list_inode_next(inode_list_t *lst, int next_pgs);
-    inline uint32_t get_bs_bitmap_granularity() { return bs_bitmap_granularity; }
-    inline uint64_t get_bs_block_size() { return bs_block_size; }
+    //inline uint32_t get_bs_bitmap_granularity() { return st_cli.global_bitmap_granularity; }
+    //inline uint64_t get_bs_block_size() { return st_cli.global_block_size; }
    uint64_t next_op_id();

 protected:
--- a/src/disk_simple_offsets.cpp
+++ b/src/disk_simple_offsets.cpp
@@ -5,15 +5,21 @@
 #include <sys/ioctl.h>
 #include <ctype.h>
 #include <unistd.h>
-#include "cli.h"
-#include "cluster_client.h"
-#include "base64.h"
 #include <sys/stat.h>

+#include "json11/json11.hpp"
+#include "str_util.h"
+#include "blockstore.h"
+
 // Calculate offsets for a block device and print OSD command line parameters
-std::function<bool(cli_result_t &)> cli_tool_t::simple_offsets(json11::Json cfg)
+void disk_tool_simple_offsets(json11::Json cfg, bool json_output)
 {
    std::string device = cfg["device"].string_value();
+    if (device == "")
+    {
+        fprintf(stderr, "Device path is missing\n");
+        exit(1);
+    }
    uint64_t object_size = parse_size(cfg["object_size"].string_value());
    uint64_t bitmap_granularity = parse_size(cfg["bitmap_granularity"].string_value());
    uint64_t journal_size = parse_size(cfg["journal_size"].string_value());
@@ -24,7 +30,7 @@ std::function<bool(cli_result_t &)> cli_tool_t::simple_offsets(json11::Json cfg)
    if (json_output)
        format = "json";
    if (!object_size)
-        object_size = DEFAULT_BLOCK_SIZE;
+        object_size = 1 << DEFAULT_DATA_BLOCK_ORDER;
    if (!bitmap_granularity)
        bitmap_granularity = DEFAULT_BITMAP_GRANULARITY;
    if (!journal_size)
@@ -79,7 +85,7 @@ std::function<bool(cli_result_t &)> cli_tool_t::simple_offsets(json11::Json cfg)
        fprintf(stderr, "Invalid device block size specified: %lu\n", device_block_size);
        exit(1);
    }
-    if (object_size < device_block_size || object_size > MAX_BLOCK_SIZE ||
+    if (object_size < device_block_size || object_size > MAX_DATA_BLOCK_SIZE ||
        object_size & (object_size-1) != 0)
    {
        fprintf(stderr, "Invalid object size specified: %lu\n", object_size);
@@ -140,5 +146,4 @@ std::function<bool(cli_result_t &)> cli_tool_t::simple_offsets(json11::Json cfg)
            device.c_str(), journal_offset, meta_offset, data_offset
        );
    }
-    return NULL;
 }
--- a/src/disk_tool.cpp
+++ b/src/disk_tool.cpp
@@ -0,0 +1,383 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#include "disk_tool.h"
+#include "str_util.h"
+
+static const char *help_text =
+    "Vitastor disk management tool\n"
+    "(c) Vitaliy Filippov, 2022+ (VNPL-1.1)\n"
+    "\n"
+    "COMMANDS:\n"
+    "\n"
+    "vitastor-disk prepare [OPTIONS] [devices...]\n"
+    "  Initialize disk(s) for Vitastor OSD(s).\n"
+    "  \n"
+    "  There are two modes of this command. In the first mode, you pass <devices> which\n"
+    "  must be raw disks (not partitions). They are partitioned automatically and OSDs\n"
+    "  are initialized on all of them.\n"
+    "  \n"
+    "  In the second mode, you omit <devices> and pass --data_device, --journal_device\n"
+    "  and/or --meta_device which must be already existing partitions identified by their\n"
+    "  GPT partition UUIDs. In this case a single OSD is created.\n"
+    "  \n"
+    "  Requires `vitastor-cli`, `wipefs`, `sfdisk` and `partprobe` (from parted) utilities.\n"
+    "  \n"
+    "  Options (automatic mode):\n"
+    "    --osd_per_disk <N>\n"
+    "      Create <N> OSDs on each disk (default 1)\n"
+    "    --hybrid\n"
+    "      Prepare hybrid (HDD+SSD) OSDs using provided devices. SSDs will be used for\n"
+    "      journals and metadata, HDDs will be used for data. Partitions for journals and\n"
+    "      metadata will be created automatically. Whether disks are SSD or HDD is decided\n"
+    "      by the `/sys/block/.../queue/rotational` flag. In hybrid mode, default object\n"
+    "      size is 1 MB instead of 128 KB, default journal size is 1 GB instead of 32 MB,\n"
+    "      and throttle_small_writes is enabled by default.\n"
+    "    --disable_data_fsync auto\n"
+    "      Disable data device cache and fsync (1/yes/true = on, default auto)\n"
+    "    --disable_meta_fsync auto\n"
+    "      Disable metadata/journal device cache and fsync (default auto)\n"
+    "    --meta_reserve 2x,1G\n"
+    "      New metadata partitions in --hybrid mode are created larger than actual\n"
+    "      metadata size to ease possible future extension. The default is to allocate\n"
+    "      2 times more space and at least 1G. Use this option to override.\n"
+    "    --max_other 10%\n"
+    "      Use disks for OSD data even if they already have non-Vitastor partitions,\n"
+    "      but only if these take up no more than this percent of disk space.\n"
+    "  \n"
+    "  Options (single-device mode):\n"
+    "    --data_device <DEV>        Use partition <DEV> for data\n"
+    "    --meta_device <DEV>        Use partition <DEV> for metadata (optional)\n"
+    "    --journal_device <DEV>     Use partition <DEV> for journal (optional)\n"
+    "    --disable_data_fsync 0     Disable data device cache and fsync (default off)\n"
+    "    --disable_meta_fsync 0     Disable metadata device cache and fsync (default off)\n"
+    "    --disable_journal_fsync 0  Disable journal device cache and fsync (default off)\n"
+    "    --force                    Bypass partition safety checks (for emptiness and so on)\n"
+    "  \n"
+    "  Options (both modes):\n"
+    "    --journal_size 1G/32M      Set journal size (area or partition size)\n"
+    "    --block_size 1M/128k       Set blockstore object size\n"
+    "    --bitmap_granularity 4k    Set bitmap granularity\n"
+    "    --data_device_block 4k     Override data device block size\n"
+    "    --meta_device_block 4k     Override metadata device block size\n"
+    "    --journal_device_block 4k  Override journal device block size\n"
+    "  \n"
+    "  immediate_commit setting is automatically derived from \"disable fsync\" options.\n"
+    "  It's set to \"all\" when fsync is disabled on all devices, and to \"small\" if fsync\n"
+    "  is only disabled on journal device.\n"
+    "  \n"
+    "  When data/meta/journal fsyncs are disabled, the OSD startup script automatically\n"
+    "  checks the device cache status on start and tries to disable cache for SATA/SAS disks.\n"
+    "  If it doesn't succeed it issues a warning in the system log.\n"
+    "  \n"
+    "  You can also pass other OSD options here as arguments and they'll be persisted\n"
+    "  to the superblock: max_write_iodepth, max_write_iodepth, min_flusher_count,\n"
+    "  max_flusher_count, inmemory_metadata, inmemory_journal, journal_sector_buffer_count,\n"
+    "  journal_no_same_sector_overwrites, throttle_small_writes, throttle_target_iops,\n"
+    "  throttle_target_mbs, throttle_target_parallelism, throttle_threshold_us.\n"
+    "\n"
+    "vitastor-disk upgrade-simple <UNIT_FILE|OSD_NUMBER>\n"
+    "  Upgrade an OSD created by old (0.7.1 and older) make-osd.sh or make-osd-hybrid.js scripts.\n"
+    "  \n"
+    "  Adds superblocks to OSD devices, disables old vitastor-osdN unit and replaces it with vitastor-osd@N.\n"
+    "  Can be invoked with an osd number of with a path to systemd service file UNIT_FILE which\n"
+    "  must be /etc/systemd/system/vitastor-osd<OSD_NUMBER>.service.\n"
+    "  \n"
+    "  Note that the procedure isn't atomic and may ruin OSD data in case of an interrupt,\n"
+    "  so don't upgrade all your OSDs in parallel.\n"
+    "  \n"
+    "  Requires the `sfdisk` utility.\n"
+    "\n"
+    "vitastor-disk resize <ALL_OSD_PARAMETERS> <NEW_LAYOUT> [--iodepth 32]\n"
+    "  Resize data area and/or rewrite/move journal and metadata\n"
+    "  ALL_OSD_PARAMETERS must include all (at least all disk-related)\n"
+    "  parameters from OSD command line (i.e. from systemd unit or superblock).\n"
+    "  NEW_LAYOUT may include new disk layout parameters:\n"
+    "    --new_data_offset SIZE     resize data area so it starts at SIZE\n"
+    "    --new_data_len SIZE        resize data area to SIZE bytes\n"
+    "    --new_meta_device PATH     use PATH for new metadata\n"
+    "    --new_meta_offset SIZE     make new metadata area start at SIZE\n"
+    "    --new_meta_len SIZE        make new metadata area SIZE bytes long\n"
+    "    --new_journal_device PATH  use PATH for new journal\n"
+    "    --new_journal_offset SIZE  make new journal area start at SIZE\n"
+    "    --new_journal_len SIZE     make new journal area SIZE bytes long\n"
+    "  SIZE may include k/m/g/t suffixes. If any of the new layout parameter\n"
+    "  options are not specified, old values will be used.\n"
+    "\n"
+    "vitastor-disk start|stop|restart|enable|disable [--now] <device> [device2 device3 ...]\n"
+    "  Manipulate Vitastor OSDs using systemd by their device paths.\n"
+    "  Commands are passed to systemctl with vitastor-osd@<num> units as arguments.\n"
+    "  When --now is added to enable/disable, OSDs are also immediately started/stopped.\n"
+    "\n"
+    "vitastor-disk read-sb <device>\n"
+    "  Try to read Vitastor OSD superblock from <device> and print it in JSON format.\n"
+    "\n"
+    "vitastor-disk write-sb <device>\n"
+    "  Read JSON from STDIN and write it into Vitastor OSD superblock on <device>.\n"
+    "\n"
+    "vitastor-disk udev <device>\n"
+    "  Try to read Vitastor OSD superblock from <device> and print variables for udev.\n"
+    "\n"
+    "vitastor-disk exec-osd <device>\n"
+    "  Read Vitastor OSD superblock from <device> and start the OSD with parameters from it.\n"
+    "  Intended for use from startup scripts (i.e. from systemd units).\n"
+    "\n"
+    "vitastor-disk pre-exec <device>\n"
+    "  Read Vitastor OSD superblock from <device> and perform pre-start checks for the OSD.\n"
+    "  For now, this only checks that device cache is in write-through mode if fsync is disabled.\n"
+    "  Intended for use from startup scripts (i.e. from systemd units).\n"
+    "\n"
+    "vitastor-disk dump-journal [OPTIONS] <journal_file> <journal_block_size> <offset> <size>\n"
+    "  Dump journal in human-readable or JSON (if --json is specified) format.\n"
+    "  Options:\n"
+    "  --all             Scan the whole journal area for entries and dump them, even outdated ones\n"
+    "  --json            Dump journal in JSON format\n"
+    "  --format entries  (Default) Dump actual journal entries as an array, without data\n"
+    "  --format data     Same as \"entries\", but also include small write data\n"
+    "  --format blocks   Dump as an array of journal blocks each containing array of entries\n"
+    "\n"
+    "vitastor-disk write-journal <journal_file> <journal_block_size> <bitmap_size> <offset> <size>\n"
+    "  Write journal from JSON taken from standard input in the same format as produced by\n"
+    "  `dump-journal --json --format data`.\n"
+    "\n"
+    "vitastor-disk dump-meta <meta_file> <meta_block_size> <offset> <size>\n"
+    "  Dump metadata in JSON format.\n"
+    "\n"
+    "vitastor-disk write-meta <meta_file> <offset> <size>\n"
+    "  Write metadata from JSON taken from standard input in the same format as produced by\n"
+    "  `dump-meta`. Intended for debugging.\n"
+    "\n"
+    "vitastor-disk simple-offsets <device>\n"
+    "  Calculate offsets for old simple&stupid (no superblock) OSD deployment. Options:\n"
+    "    --object_size 128k       Set blockstore block size\n"
+    "    --bitmap_granularity 4k  Set bitmap granularity\n"
+    "    --journal_size 16M       Set journal size\n"
+    "    --device_block_size 4k   Set device block size\n"
+    "    --journal_offset 0       Set journal offset\n"
+    "    --device_size 0          Set device size\n"
+    "    --format text            Result format: json, options, env, or text\n"
+    "\n"
+    "Use vitastor-disk --help <command> for command details or vitastor-disk --help --all for all details.\n"
+;
+
+disk_tool_t::~disk_tool_t()
+{
+    if (data_alloc)
+    {
+        delete data_alloc;
+        data_alloc = NULL;
+    }
+}
+
+int main(int argc, char *argv[])
+{
+    disk_tool_t self = {};
+    std::vector<char*> cmd;
+    char *exe_name = strrchr(argv[0], '/');
+    exe_name = exe_name ? exe_name+1 : argv[0];
+    bool aliased = false;
+    if (!strcmp(exe_name, "vitastor-dump-journal"))
+    {
+        cmd.push_back((char*)"dump-journal");
+        aliased = true;
+    }
+    for (int i = 1; i < argc; i++)
+    {
+        if (!strcmp(argv[i], "--all"))
+        {
+            self.all = true;
+        }
+        else if (!strcmp(argv[i], "--json"))
+        {
+            self.json = true;
+        }
+        else if (!strcmp(argv[i], "--hybrid"))
+        {
+            self.options["hybrid"] = "1";
+        }
+        else if (!strcmp(argv[i], "--help") || !strcmp(argv[i], "-h"))
+        {
+            cmd.insert(cmd.begin(), (char*)"help");
+        }
+        else if (!strcmp(argv[i], "--now"))
+        {
+            self.now = true;
+        }
+        else if (!strcmp(argv[i], "--force"))
+        {
+            self.options["force"] = "1";
+        }
+        else if (argv[i][0] == '-' && argv[i][1] == '-')
+        {
+            char *key = argv[i]+2;
+            self.options[key] = argv[++i];
+        }
+        else
+        {
+            cmd.push_back(argv[i]);
+        }
+    }
+    if (!cmd.size())
+    {
+        cmd.push_back((char*)"help");
+    }
+    if (!strcmp(cmd[0], "dump-journal"))
+    {
+        if (cmd.size() < 5)
+        {
+            print_help(help_text, aliased ? "vitastor-dump-journal" : "vitastor-disk", cmd[0], false);
+            return 1;
+        }
+        self.dsk.journal_device = cmd[1];
+        self.dsk.journal_block_size = strtoul(cmd[2], NULL, 10);
+        self.dsk.journal_offset = strtoull(cmd[3], NULL, 10);
+        self.dsk.journal_len = strtoull(cmd[4], NULL, 10);
+        return self.dump_journal();
+    }
+    else if (!strcmp(cmd[0], "write-journal"))
+    {
+        if (cmd.size() < 6)
+        {
+            print_help(help_text, "vitastor-disk", cmd[0], false);
+            return 1;
+        }
+        self.new_journal_device = cmd[1];
+        self.dsk.journal_block_size = strtoul(cmd[2], NULL, 10);
+        self.dsk.clean_entry_bitmap_size = strtoul(cmd[3], NULL, 10);
+        self.new_journal_offset = strtoull(cmd[4], NULL, 10);
+        self.new_journal_len = strtoull(cmd[5], NULL, 10);
+        std::string json_err;
+        json11::Json entries = json11::Json::parse(read_all_fd(0), json_err);
+        if (json_err != "")
+        {
+            fprintf(stderr, "Invalid JSON: %s\n", json_err.c_str());
+            return 1;
+        }
+        return self.write_json_journal(entries);
+    }
+    else if (!strcmp(cmd[0], "dump-meta"))
+    {
+        if (cmd.size() < 5)
+        {
+            print_help(help_text, "vitastor-disk", cmd[0], false);
+            return 1;
+        }
+        self.dsk.meta_device = cmd[1];
+        self.dsk.meta_block_size = strtoul(cmd[2], NULL, 10);
+        self.dsk.meta_offset = strtoull(cmd[3], NULL, 10);
+        self.dsk.meta_len = strtoull(cmd[4], NULL, 10);
+        return self.dump_meta();
+    }
+    else if (!strcmp(cmd[0], "write-meta"))
+    {
+        if (cmd.size() < 4)
+        {
+            print_help(help_text, "vitastor-disk", cmd[0], false);
+            return 1;
+        }
+        self.new_meta_device = cmd[1];
+        self.new_meta_offset = strtoull(cmd[2], NULL, 10);
+        self.new_meta_len = strtoull(cmd[3], NULL, 10);
+        std::string json_err;
+        json11::Json meta = json11::Json::parse(read_all_fd(0), json_err);
+        if (json_err != "")
+        {
+            fprintf(stderr, "Invalid JSON: %s\n", json_err.c_str());
+            return 1;
+        }
+        return self.write_json_meta(meta);
+    }
+    else if (!strcmp(cmd[0], "resize"))
+    {
+        return self.resize_data();
+    }
+    else if (!strcmp(cmd[0], "simple-offsets"))
+    {
+        // Calculate offsets for simple & stupid OSD deployment without superblock
+        if (cmd.size() > 1)
+        {
+            self.options["device"] = cmd[1];
+        }
+        disk_tool_simple_offsets(self.options, self.json);
+        return 0;
+    }
+    else if (!strcmp(cmd[0], "udev"))
+    {
+        if (cmd.size() != 2)
+        {
+            fprintf(stderr, "Exactly 1 device path argument is required\n");
+            return 1;
+        }
+        return self.udev_import(cmd[1]);
+    }
+    else if (!strcmp(cmd[0], "read-sb"))
+    {
+        if (cmd.size() != 2)
+        {
+            fprintf(stderr, "Exactly 1 device path argument is required\n");
+            return 1;
+        }
+        return self.read_sb(cmd[1]);
+    }
+    else if (!strcmp(cmd[0], "write-sb"))
+    {
+        if (cmd.size() != 2)
+        {
+            fprintf(stderr, "Exactly 1 device path argument is required\n");
+            return 1;
+        }
+        return self.write_sb(cmd[1]);
+    }
+    else if (!strcmp(cmd[0], "start") || !strcmp(cmd[0], "stop") ||
+        !strcmp(cmd[0], "restart") || !strcmp(cmd[0], "enable") || !strcmp(cmd[0], "disable"))
+    {
+        std::vector<std::string> systemd_cmd;
+        systemd_cmd.push_back(cmd[0]);
+        if (self.now && (!strcmp(cmd[0], "enable") || !strcmp(cmd[0], "disable")))
+        {
+            systemd_cmd.push_back("--now");
+        }
+        return self.systemd_start_stop_osds(systemd_cmd, std::vector<std::string>(cmd.begin()+1, cmd.end()));
+    }
+    else if (!strcmp(cmd[0], "exec-osd"))
+    {
+        if (cmd.size() != 2)
+        {
+            fprintf(stderr, "Exactly 1 device path argument is required\n");
+            return 1;
+        }
+        return self.exec_osd(cmd[1]);
+    }
+    else if (!strcmp(cmd[0], "pre-exec"))
+    {
+        if (cmd.size() != 2)
+        {
+            fprintf(stderr, "Exactly 1 device path argument is required\n");
+            return 1;
+        }
+        return self.pre_exec_osd(cmd[1]);
+    }
+    else if (!strcmp(cmd[0], "prepare"))
+    {
+        std::vector<std::string> devs;
+        for (int i = 1; i < cmd.size(); i++)
+        {
+            devs.push_back(cmd[i]);
+        }
+        return self.prepare(devs);
+    }
+    else if (!strcmp(cmd[0], "upgrade-simple"))
+    {
+        if (cmd.size() != 2)
+        {
+            fprintf(stderr, "Exactly 1 OSD number or systemd unit path is required\n");
+            return 1;
+        }
+        return self.upgrade_simple_unit(cmd[1]);
+    }
+    else
+    {
+        print_help(help_text, "vitastor-disk", cmd.size() > 1 ? cmd[1] : "", self.all);
+    }
+    return 0;
+}
--- a/src/disk_tool.h
+++ b/src/disk_tool.h
@@ -0,0 +1,141 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#pragma once
+
+#ifndef _LARGEFILE64_SOURCE
+#define _LARGEFILE64_SOURCE 1
+#endif
+
+#include <map>
+#include <vector>
+#include <string>
+#include <functional>
+
+#include "json11/json11.hpp"
+#include "blockstore_disk.h"
+#include "blockstore_impl.h"
+#include "crc32c.h"
+
+// vITADisk
+#define VITASTOR_DISK_MAGIC 0x6b73694441544976
+#define VITASTOR_DISK_MAX_SB_SIZE 128*1024
+#define VITASTOR_PART_TYPE "e7009fac-a5a1-4d72-af72-53de13059903"
+#define DEFAULT_HYBRID_JOURNAL "1G"
+
+struct resizer_data_moving_t;
+
+struct vitastor_dev_info_t
+{
+    std::string path;
+    bool is_hdd;
+    json11::Json pt; // pt = partition table
+    int osd_part_count;
+    uint64_t size;
+    uint64_t free;
+};
+
+struct disk_tool_t
+{
+    /**** Parameters ****/
+
+    std::map<std::string, std::string> options;
+    bool all, json, now;
+    bool dump_with_blocks, dump_with_data;
+    blockstore_disk_t dsk;
+
+    // resize data and/or move metadata and journal
+    int iodepth;
+    std::string new_meta_device, new_journal_device;
+    uint64_t new_data_offset, new_data_len;
+    uint64_t new_journal_offset, new_journal_len;
+    uint64_t new_meta_offset, new_meta_len;
+
+    /**** State ****/
+
+    uint64_t meta_pos;
+    uint64_t journal_pos, journal_calc_data_pos;
+
+    bool first, first2;
+
+    allocator *data_alloc;
+    std::map<uint64_t, uint64_t> data_remap;
+    std::map<uint64_t, uint64_t>::iterator remap_it;
+    ring_loop_t *ringloop;
+    ring_consumer_t ring_consumer;
+    int remap_active;
+    uint8_t *new_journal_buf, *new_meta_buf, *new_journal_ptr, *new_journal_data;
+    uint64_t new_journal_in_pos;
+    int64_t data_idx_diff;
+    uint64_t total_blocks, free_first, free_last;
+    uint64_t new_clean_entry_bitmap_size, new_clean_entry_size, new_entries_per_block;
+    int new_journal_fd, new_meta_fd;
+    resizer_data_moving_t *moving_blocks;
+
+    bool started;
+    void *small_write_data;
+    uint32_t data_crc32;
+    uint32_t crc32_last;
+    uint32_t new_crc32_prev;
+
+    ~disk_tool_t();
+
+    int dump_journal();
+    void dump_journal_entry(int num, journal_entry *je, bool json);
+    int process_journal(std::function<int(void*)> block_fn);
+    int process_journal_block(void *buf, std::function<void(int, journal_entry*)> iter_fn);
+    int process_meta(std::function<void(blockstore_meta_header_v1_t *)> hdr_fn,
+        std::function<void(uint64_t, clean_disk_entry*, uint8_t*)> record_fn);
+
+    int dump_meta();
+    void dump_meta_header(blockstore_meta_header_v1_t *hdr);
+    void dump_meta_entry(uint64_t block_num, clean_disk_entry *entry, uint8_t *bitmap);
+
+    int write_json_journal(json11::Json entries);
+    int write_json_meta(json11::Json meta);
+
+    int resize_data();
+    int resize_parse_params();
+    void resize_init(blockstore_meta_header_v1_t *hdr);
+    int resize_remap_blocks();
+    int resize_copy_data();
+    int resize_rewrite_journal();
+    int resize_write_new_journal();
+    int resize_rewrite_meta();
+    int resize_write_new_meta();
+
+    int udev_import(std::string device);
+    int read_sb(std::string device);
+    int write_sb(std::string device);
+    int exec_osd(std::string device);
+    int systemd_start_stop_osds(std::vector<std::string> cmd, std::vector<std::string> devices);
+    int pre_exec_osd(std::string device);
+
+    json11::Json read_osd_superblock(std::string device, bool expect_exist = true);
+    uint32_t write_osd_superblock(std::string device, json11::Json params);
+
+    int prepare_one(std::map<std::string, std::string> options, int is_hdd = -1);
+    int prepare(std::vector<std::string> devices);
+    std::vector<vitastor_dev_info_t> collect_devices(const std::vector<std::string> & devices);
+    json11::Json add_partitions(vitastor_dev_info_t & devinfo, std::vector<std::string> sizes);
+    std::vector<std::string> get_new_data_parts(vitastor_dev_info_t & dev, uint64_t osd_per_disk, uint64_t max_other_percent);
+    int get_meta_partition(std::vector<vitastor_dev_info_t> & ssds, std::map<std::string, std::string> & options);
+
+    int upgrade_simple_unit(std::string unit);
+};
+
+void disk_tool_simple_offsets(json11::Json cfg, bool json_output);
+
+uint64_t sscanf_json(const char *fmt, const json11::Json & str);
+void fromhexstr(const std::string & from, int bytes, uint8_t *to);
+std::string realpath_str(std::string path, bool nofail = true);
+std::string read_all_fd(int fd);
+std::string read_file(std::string file, bool allow_enoent = false);
+int disable_cache(std::string dev);
+std::string get_parent_device(std::string dev);
+bool json_is_true(const json11::Json & val);
+int shell_exec(const std::vector<std::string> & cmd, const std::string & in, std::string *out, std::string *err);
+int write_zero(int fd, uint64_t offset, uint64_t size);
+json11::Json read_parttable(std::string dev);
+uint64_t dev_size_from_parttable(json11::Json pt);
+uint64_t free_from_parttable(json11::Json pt);
--- a/src/disk_tool_journal.cpp
+++ b/src/disk_tool_journal.cpp
@@ -0,0 +1,453 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#include "disk_tool.h"
+
+int disk_tool_t::dump_journal()
+{
+    dump_with_blocks = options["format"] == "blocks";
+    dump_with_data = options["format"] == "data" || options["format"] == "blocks,data";
+    if (dsk.journal_block_size < DIRECT_IO_ALIGNMENT || (dsk.journal_block_size % DIRECT_IO_ALIGNMENT) ||
+        dsk.journal_block_size > 128*1024)
+    {
+        fprintf(stderr, "Invalid journal block size\n");
+        return 1;
+    }
+    first = true;
+    if (json)
+        printf("[\n");
+    if (all)
+    {
+        dsk.journal_fd = open(dsk.journal_device.c_str(), O_DIRECT|O_RDONLY);
+        if (dsk.journal_fd < 0)
+        {
+            fprintf(stderr, "Failed to open journal device %s: %s\n", dsk.journal_device.c_str(), strerror(errno));
+            return 1;
+        }
+        void *journal_buf = memalign_or_die(MEM_ALIGNMENT, dsk.journal_block_size);
+        journal_pos = 0;
+        while (journal_pos < dsk.journal_len)
+        {
+            int r = pread(dsk.journal_fd, journal_buf, dsk.journal_block_size, dsk.journal_offset+journal_pos);
+            assert(r == dsk.journal_block_size);
+            uint64_t s;
+            for (s = 0; s < dsk.journal_block_size; s += 8)
+            {
+                if (*((uint64_t*)((uint8_t*)journal_buf+s)) != 0)
+                    break;
+            }
+            if (json)
+            {
+                printf("%s{\"offset\":\"0x%lx\"", first ? "" : ",\n", journal_pos);
+                first = false;
+            }
+            if (s == dsk.journal_block_size)
+            {
+                if (json)
+                    printf(",\"type\":\"zero\"}");
+                else
+                    printf("offset %08lx: zeroes\n", journal_pos);
+                journal_pos += dsk.journal_block_size;
+            }
+            else if (((journal_entry*)journal_buf)->magic == JOURNAL_MAGIC)
+            {
+                if (!json)
+                    printf("offset %08lx:\n", journal_pos);
+                else
+                    printf(",\"entries\":[\n");
+                first2 = true;
+                process_journal_block(journal_buf, [this](int num, journal_entry *je) { dump_journal_entry(num, je, json); });
+                if (json)
+                    printf(first2 ? "]}" : "\n]}");
+            }
+            else
+            {
+                if (json)
+                    printf(",\"type\":\"data\",\"pattern\":\"%08lx\"}", *((uint64_t*)journal_buf));
+                else
+                    printf("offset %08lx: no magic in the beginning, looks like random data (pattern=%08lx)\n", journal_pos, *((uint64_t*)journal_buf));
+                journal_pos += dsk.journal_block_size;
+            }
+        }
+        free(journal_buf);
+        close(dsk.journal_fd);
+        dsk.journal_fd = -1;
+    }
+    else
+    {
+        process_journal([this](void *data)
+        {
+            first2 = true;
+            if (!json)
+                printf("offset %08lx:\n", journal_pos);
+            auto pos = journal_pos;
+            int r = process_journal_block(data, [this, pos](int num, journal_entry *je)
+            {
+                if (json && first2)
+                {
+                    if (dump_with_blocks)
+                        printf("%s{\"offset\":\"0x%lx\",\"entries\":[\n", first ? "" : ",\n", pos);
+                    first = false;
+                }
+                dump_journal_entry(num, je, json);
+            });
+            if (json)
+            {
+                if (dump_with_blocks && !first2)
+                    printf("\n]}");
+            }
+            else if (r <= 0)
+                printf("end of the journal\n");
+            return r;
+        });
+    }
+    if (json)
+        printf(first ? "]\n" : "\n]\n");
+    return 0;
+}
+
+int disk_tool_t::process_journal(std::function<int(void*)> block_fn)
+{
+    dsk.journal_fd = open(dsk.journal_device.c_str(), O_DIRECT|O_RDONLY);
+    if (dsk.journal_fd < 0)
+    {
+        fprintf(stderr, "Failed to open journal device %s: %s\n", dsk.journal_device.c_str(), strerror(errno));
+        return 1;
+    }
+    void *data = memalign_or_die(MEM_ALIGNMENT, dsk.journal_block_size);
+    journal_pos = 0;
+    int r = pread(dsk.journal_fd, data, dsk.journal_block_size, dsk.journal_offset+journal_pos);
+    assert(r == dsk.journal_block_size);
+    journal_entry *je = (journal_entry*)(data);
+    if (je->magic != JOURNAL_MAGIC || je->type != JE_START || je_crc32(je) != je->crc32)
+    {
+        fprintf(stderr, "offset %08lx: journal superblock is invalid\n", journal_pos);
+        r = 1;
+    }
+    else
+    {
+        started = false;
+        crc32_last = 0;
+        block_fn(data);
+        started = false;
+        crc32_last = 0;
+        journal_pos = je->start.journal_start;
+        while (1)
+        {
+            if (journal_pos >= dsk.journal_len)
+                journal_pos = dsk.journal_block_size;
+            r = pread(dsk.journal_fd, data, dsk.journal_block_size, dsk.journal_offset+journal_pos);
+            assert(r == dsk.journal_block_size);
+            r = block_fn(data);
+            if (r <= 0)
+                break;
+        }
+    }
+    close(dsk.journal_fd);
+    dsk.journal_fd = -1;
+    free(data);
+    return r;
+}
+
+int disk_tool_t::process_journal_block(void *buf, std::function<void(int, journal_entry*)> iter_fn)
+{
+    uint32_t pos = 0;
+    journal_pos += dsk.journal_block_size;
+    int entry = 0;
+    bool wrapped = false;
+    while (pos <= dsk.journal_block_size-JOURNAL_ENTRY_HEADER_SIZE)
+    {
+        journal_entry *je = (journal_entry*)((uint8_t*)buf + pos);
+        if (je->magic != JOURNAL_MAGIC || je->type < JE_MIN || je->type > JE_MAX ||
+            !all && started && je->crc32_prev != crc32_last || pos > dsk.journal_block_size-je->size)
+        {
+            break;
+        }
+        bool crc32_valid = je_crc32(je) == je->crc32;
+        if (!all && !crc32_valid)
+        {
+            break;
+        }
+        started = true;
+        crc32_last = je->crc32;
+        if (je->type == JE_SMALL_WRITE || je->type == JE_SMALL_WRITE_INSTANT)
+        {
+            journal_calc_data_pos = journal_pos;
+            if (journal_pos + je->small_write.len > dsk.journal_len)
+            {
+                // data continues from the beginning of the journal
+                journal_calc_data_pos = journal_pos = dsk.journal_block_size;
+                wrapped = true;
+            }
+            journal_pos += je->small_write.len;
+            if (journal_pos >= dsk.journal_len)
+            {
+                journal_pos = dsk.journal_block_size;
+                wrapped = true;
+            }
+            small_write_data = memalign_or_die(MEM_ALIGNMENT, je->small_write.len);
+            assert(pread(dsk.journal_fd, small_write_data, je->small_write.len, dsk.journal_offset+je->small_write.data_offset) == je->small_write.len);
+            data_crc32 = crc32c(0, small_write_data, je->small_write.len);
+        }
+        iter_fn(entry, je);
+        if (je->type == JE_SMALL_WRITE || je->type == JE_SMALL_WRITE_INSTANT)
+        {
+            free(small_write_data);
+            small_write_data = NULL;
+        }
+        pos += je->size;
+        entry++;
+    }
+    if (wrapped)
+    {
+        journal_pos = dsk.journal_len;
+    }
+    return entry;
+}
+
+void disk_tool_t::dump_journal_entry(int num, journal_entry *je, bool json)
+{
+    if (json)
+    {
+        if (!first2)
+            printf(",\n");
+        first2 = false;
+        printf(
+            "{\"crc32\":\"%08x\",\"valid\":%s,\"crc32_prev\":\"%08x\"",
+            je->crc32, (je_crc32(je) == je->crc32 ? "true" : "false"), je->crc32_prev
+        );
+    }
+    else
+    {
+        printf(
+            "entry % 3d: crc32=%08x %s prev=%08x ",
+            num, je->crc32, (je_crc32(je) == je->crc32 ? "(valid)" : "(invalid)"), je->crc32_prev
+        );
+    }
+    if (je->type == JE_START)
+    {
+        printf(
+            json ? ",\"type\":\"start\",\"start\":\"0x%lx\"}" : "je_start start=%08lx\n",
+            je->start.journal_start
+        );
+    }
+    else if (je->type == JE_SMALL_WRITE || je->type == JE_SMALL_WRITE_INSTANT)
+    {
+        printf(
+            json ? ",\"type\":\"small_write%s\",\"inode\":\"0x%lx\",\"stripe\":\"0x%lx\",\"ver\":\"%lu\",\"offset\":%u,\"len\":%u,\"loc\":\"0x%lx\""
+                : "je_small_write%s oid=%lx:%lx ver=%lu offset=%u len=%u loc=%08lx",
+            je->type == JE_SMALL_WRITE_INSTANT ? "_instant" : "",
+            je->small_write.oid.inode, je->small_write.oid.stripe,
+            je->small_write.version, je->small_write.offset, je->small_write.len,
+            je->small_write.data_offset
+        );
+        if (journal_calc_data_pos != je->small_write.data_offset)
+        {
+            printf(json ? ",\"bad_loc\":true,\"calc_loc\":\"0x%lx\""
+                : " (mismatched, calculated = %lu)", journal_pos);
+        }
+        if (je->small_write.size > sizeof(journal_entry_small_write))
+        {
+            printf(json ? ",\"bitmap\":\"" : " (bitmap: ");
+            for (int i = sizeof(journal_entry_small_write); i < je->small_write.size; i++)
+            {
+                printf("%02x", ((uint8_t*)je)[i]);
+            }
+            printf(json ? "\"" : ")");
+        }
+        if (dump_with_data)
+        {
+            printf(json ? ",\"data\":\"" : " (data: ");
+            for (int i = 0; i < je->small_write.len; i++)
+            {
+                printf("%02x", ((uint8_t*)small_write_data)[i]);
+            }
+            printf(json ? "\"" : ")");
+        }
+        printf(
+            json ? ",\"data_crc32\":\"%08x\",\"data_valid\":%s}" : " data_crc32=%08x%s\n",
+            je->small_write.crc32_data,
+            (data_crc32 != je->small_write.crc32_data
+                ? (json ? "false" : " (invalid)")
+                : (json ? "true" : " (valid)"))
+        );
+    }
+    else if (je->type == JE_BIG_WRITE || je->type == JE_BIG_WRITE_INSTANT)
+    {
+        printf(
+            json ? ",\"type\":\"big_write%s\",\"inode\":\"0x%lx\",\"stripe\":\"0x%lx\",\"ver\":\"%lu\",\"loc\":\"0x%lx\""
+                : "je_big_write%s oid=%lx:%lx ver=%lu loc=%08lx",
+            je->type == JE_BIG_WRITE_INSTANT ? "_instant" : "",
+            je->big_write.oid.inode, je->big_write.oid.stripe, je->big_write.version, je->big_write.location
+        );
+        if (je->big_write.size > sizeof(journal_entry_big_write))
+        {
+            printf(json ? ",\"bitmap\":\"" : " (bitmap: ");
+            for (int i = sizeof(journal_entry_big_write); i < je->small_write.size; i++)
+            {
+                printf("%02x", ((uint8_t*)je)[i]);
+            }
+            printf(json ? "\"" : ")");
+        }
+        printf(json ? "}" : "\n");
+    }
+    else if (je->type == JE_STABLE)
+    {
+        printf(
+            json ? ",\"type\":\"stable\",\"inode\":\"0x%lx\",\"stripe\":\"0x%lx\",\"ver\":\"%lu\"}"
+                : "je_stable oid=%lx:%lx ver=%lu\n",
+            je->stable.oid.inode, je->stable.oid.stripe, je->stable.version
+        );
+    }
+    else if (je->type == JE_ROLLBACK)
+    {
+        printf(
+            json ? ",\"type\":\"rollback\",\"inode\":\"0x%lx\",\"stripe\":\"0x%lx\",\"ver\":\"%lu\"}"
+                : "je_rollback oid=%lx:%lx ver=%lu\n",
+            je->rollback.oid.inode, je->rollback.oid.stripe, je->rollback.version
+        );
+    }
+    else if (je->type == JE_DELETE)
+    {
+        printf(
+            json ? ",\"type\":\"delete\",\"inode\":\"0x%lx\",\"stripe\":\"0x%lx\",\"ver\":\"%lu\"}"
+                : "je_delete oid=%lx:%lx ver=%lu\n",
+            je->del.oid.inode, je->del.oid.stripe, je->del.version
+        );
+    }
+}
+
+int disk_tool_t::write_json_journal(json11::Json entries)
+{
+    new_journal_buf = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, new_journal_len);
+    new_journal_ptr = new_journal_buf;
+    new_journal_data = new_journal_ptr + dsk.journal_block_size;
+    new_journal_in_pos = 0;
+    memset(new_journal_buf, 0, new_journal_len);
+    std::map<std::string,uint16_t> type_by_name = {
+        { "start", JE_START },
+        { "small_write", JE_SMALL_WRITE },
+        { "small_write_instant", JE_SMALL_WRITE_INSTANT },
+        { "big_write", JE_BIG_WRITE },
+        { "big_write_instant", JE_BIG_WRITE_INSTANT },
+        { "stable", JE_STABLE },
+        { "delete", JE_DELETE },
+        { "rollback", JE_ROLLBACK },
+    };
+    // Write start entry into the first block
+    *((journal_entry_start*)new_journal_buf) = (journal_entry_start){
+        .magic = JOURNAL_MAGIC,
+        .type = JE_START,
+        .size = sizeof(journal_entry_start),
+        .journal_start = dsk.journal_block_size,
+        .version = JOURNAL_VERSION,
+    };
+    ((journal_entry*)new_journal_buf)->crc32 = je_crc32((journal_entry*)new_journal_buf);
+    new_journal_ptr += dsk.journal_block_size;
+    new_journal_data = new_journal_ptr+dsk.journal_block_size;
+    new_journal_in_pos = 0;
+    for (const auto & rec: entries.array_items())
+    {
+        auto t_it = type_by_name.find(rec["type"].string_value());
+        if (t_it == type_by_name.end())
+        {
+            fprintf(stderr, "Unknown journal entry type \"%s\", skipping\n", rec["type"].string_value().c_str());
+            continue;
+        }
+        uint16_t type = t_it->second;
+        if (type == JE_START)
+            continue;
+        uint32_t entry_size = (type == JE_START
+            ? sizeof(journal_entry_start)
+            : (type == JE_SMALL_WRITE || type == JE_SMALL_WRITE_INSTANT
+                ? sizeof(journal_entry_small_write) + dsk.clean_entry_bitmap_size
+                : (type == JE_BIG_WRITE || type == JE_BIG_WRITE_INSTANT
+                    ? sizeof(journal_entry_big_write) + dsk.clean_entry_bitmap_size
+                    : sizeof(journal_entry_del))));
+        if (dsk.journal_block_size < new_journal_in_pos + entry_size)
+        {
+            new_journal_ptr = new_journal_data;
+            if (new_journal_ptr-new_journal_buf >= new_journal_len)
+            {
+                fprintf(stderr, "Error: entries don't fit to the new journal\n");
+                free(new_journal_buf);
+                return 1;
+            }
+            new_journal_data = new_journal_ptr+dsk.journal_block_size;
+            new_journal_in_pos = 0;
+            if (dsk.journal_block_size < entry_size)
+            {
+                fprintf(stderr, "Error: journal entry too large (%u bytes)\n", entry_size);
+                free(new_journal_buf);
+                return 1;
+            }
+        }
+        journal_entry *ne = (journal_entry*)(new_journal_ptr + new_journal_in_pos);
+        if (type == JE_SMALL_WRITE || type == JE_SMALL_WRITE_INSTANT)
+        {
+            if (new_journal_data - new_journal_buf + ne->small_write.len > new_journal_len)
+            {
+                fprintf(stderr, "Error: entries don't fit to the new journal\n");
+                free(new_journal_buf);
+                return 1;
+            }
+            *((journal_entry_small_write*)ne) = (journal_entry_small_write){
+                .magic = JOURNAL_MAGIC,
+                .type = type,
+                .size = entry_size,
+                .crc32_prev = new_crc32_prev,
+                .oid = {
+                    .inode = sscanf_json(NULL, rec["inode"]),
+                    .stripe = sscanf_json(NULL, rec["stripe"]),
+                },
+                .version = rec["ver"].uint64_value(),
+                .offset = (uint32_t)rec["offset"].uint64_value(),
+                .len = (uint32_t)rec["len"].uint64_value(),
+                .data_offset = (uint64_t)(new_journal_data-new_journal_buf),
+                .crc32_data = (uint32_t)sscanf_json("%x", rec["data_crc32"]),
+            };
+            fromhexstr(rec["bitmap"].string_value(), dsk.clean_entry_bitmap_size, ((uint8_t*)ne) + sizeof(journal_entry_small_write));
+            fromhexstr(rec["data"].string_value(), ne->small_write.len, new_journal_data);
+            if (rec["data"].is_string())
+                ne->small_write.crc32_data = crc32c(0, new_journal_data, ne->small_write.len);
+            new_journal_data += ne->small_write.len;
+        }
+        else if (type == JE_BIG_WRITE || type == JE_BIG_WRITE_INSTANT)
+        {
+            *((journal_entry_big_write*)ne) = (journal_entry_big_write){
+                .magic = JOURNAL_MAGIC,
+                .type = type,
+                .size = entry_size,
+                .crc32_prev = new_crc32_prev,
+                .oid = {
+                    .inode = sscanf_json(NULL, rec["inode"]),
+                    .stripe = sscanf_json(NULL, rec["stripe"]),
+                },
+                .version = rec["ver"].uint64_value(),
+                .len = (uint32_t)rec["len"].uint64_value(),
+                .location = sscanf_json(NULL, rec["loc"]),
+            };
+            fromhexstr(rec["bitmap"].string_value(), dsk.clean_entry_bitmap_size, ((uint8_t*)ne) + sizeof(journal_entry_big_write));
+        }
+        else if (type == JE_STABLE || type == JE_ROLLBACK || type == JE_DELETE)
+        {
+            *((journal_entry_del*)ne) = (journal_entry_del){
+                .magic = JOURNAL_MAGIC,
+                .type = type,
+                .size = entry_size,
+                .crc32_prev = new_crc32_prev,
+                .oid = {
+                    .inode = sscanf_json(NULL, rec["inode"]),
+                    .stripe = sscanf_json(NULL, rec["stripe"]),
+                },
+                .version = rec["ver"].uint64_value(),
+            };
+        }
+        ne->crc32 = je_crc32(ne);
+        new_crc32_prev = ne->crc32;
+        new_journal_in_pos += ne->size;
+    }
+    int r = resize_write_new_journal();
+    free(new_journal_buf);
+    return r;
+}
--- a/src/disk_tool_meta.cpp
+++ b/src/disk_tool_meta.cpp
@@ -0,0 +1,201 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#include "disk_tool.h"
+#include "rw_blocking.h"
+#include "osd_id.h"
+
+int disk_tool_t::process_meta(std::function<void(blockstore_meta_header_v1_t *)> hdr_fn,
+    std::function<void(uint64_t, clean_disk_entry*, uint8_t*)> record_fn)
+{
+    if (dsk.meta_block_size % DIRECT_IO_ALIGNMENT)
+    {
+        fprintf(stderr, "Invalid metadata block size: is not a multiple of %d\n", DIRECT_IO_ALIGNMENT);
+        return 1;
+    }
+    dsk.meta_fd = open(dsk.meta_device.c_str(), O_DIRECT|O_RDONLY);
+    if (dsk.meta_fd < 0)
+    {
+        fprintf(stderr, "Failed to open metadata device %s: %s\n", dsk.meta_device.c_str(), strerror(errno));
+        return 1;
+    }
+    int buf_size = 1024*1024;
+    if (buf_size % dsk.meta_block_size)
+        buf_size = 8*dsk.meta_block_size;
+    if (buf_size > dsk.meta_len)
+        buf_size = dsk.meta_len;
+    void *data = memalign_or_die(MEM_ALIGNMENT, buf_size);
+    lseek64(dsk.meta_fd, dsk.meta_offset, 0);
+    read_blocking(dsk.meta_fd, data, buf_size);
+    // Check superblock
+    blockstore_meta_header_v1_t *hdr = (blockstore_meta_header_v1_t *)data;
+    if (hdr->zero == 0 &&
+        hdr->magic == BLOCKSTORE_META_MAGIC_V1 &&
+        hdr->version == BLOCKSTORE_META_VERSION_V1)
+    {
+        // Vitastor 0.6-0.7 - static array of clean_disk_entry with bitmaps
+        if (hdr->meta_block_size != dsk.meta_block_size)
+        {
+            fprintf(stderr, "Using block size of %u bytes based on information from the superblock\n", hdr->meta_block_size);
+            dsk.meta_block_size = hdr->meta_block_size;
+            if (buf_size % dsk.meta_block_size)
+            {
+                buf_size = 8*dsk.meta_block_size;
+                free(data);
+                data = memalign_or_die(MEM_ALIGNMENT, buf_size);
+            }
+        }
+        dsk.bitmap_granularity = hdr->bitmap_granularity;
+        dsk.clean_entry_bitmap_size = hdr->data_block_size / hdr->bitmap_granularity / 8;
+        dsk.clean_entry_size = sizeof(clean_disk_entry) + 2*dsk.clean_entry_bitmap_size;
+        uint64_t block_num = 0;
+        hdr_fn(hdr);
+        meta_pos = dsk.meta_block_size;
+        lseek64(dsk.meta_fd, dsk.meta_offset+meta_pos, 0);
+        while (meta_pos < dsk.meta_len)
+        {
+            uint64_t read_len = buf_size < dsk.meta_len-meta_pos ? buf_size : dsk.meta_len-meta_pos;
+            read_blocking(dsk.meta_fd, data, read_len);
+            meta_pos += read_len;
+            for (uint64_t blk = 0; blk < read_len; blk += dsk.meta_block_size)
+            {
+                for (uint64_t ioff = 0; ioff <= dsk.meta_block_size-dsk.clean_entry_size; ioff += dsk.clean_entry_size, block_num++)
+                {
+                    clean_disk_entry *entry = (clean_disk_entry*)((uint8_t*)data + blk + ioff);
+                    if (entry->oid.inode)
+                    {
+                        record_fn(block_num, entry, entry->bitmap);
+                    }
+                }
+            }
+        }
+    }
+    else
+    {
+        // Vitastor 0.4-0.5 - static array of clean_disk_entry
+        dsk.clean_entry_bitmap_size = 0;
+        dsk.clean_entry_size = sizeof(clean_disk_entry);
+        uint64_t block_num = 0;
+        hdr_fn(NULL);
+        while (meta_pos < dsk.meta_len)
+        {
+            uint64_t read_len = buf_size < dsk.meta_len-meta_pos ? buf_size : dsk.meta_len-meta_pos;
+            read_blocking(dsk.meta_fd, data, read_len);
+            meta_pos += read_len;
+            for (uint64_t blk = 0; blk < read_len; blk += dsk.meta_block_size)
+            {
+                for (uint64_t ioff = 0; ioff < dsk.meta_block_size-dsk.clean_entry_size; ioff += dsk.clean_entry_size, block_num++)
+                {
+                    clean_disk_entry *entry = (clean_disk_entry*)((uint8_t*)data + blk + ioff);
+                    if (entry->oid.inode)
+                    {
+                        record_fn(block_num, entry, NULL);
+                    }
+                }
+            }
+        }
+    }
+    free(data);
+    close(dsk.meta_fd);
+    dsk.meta_fd = -1;
+    return 0;
+}
+
+int disk_tool_t::dump_meta()
+{
+    int r = process_meta(
+        [this](blockstore_meta_header_v1_t *hdr) { dump_meta_header(hdr); },
+        [this](uint64_t block_num, clean_disk_entry *entry, uint8_t *bitmap) { dump_meta_entry(block_num, entry, bitmap); }
+    );
+    printf("\n]}\n");
+    return r;
+}
+
+void disk_tool_t::dump_meta_header(blockstore_meta_header_v1_t *hdr)
+{
+    if (hdr)
+    {
+        printf(
+            "{\"version\":\"0.6\",\"meta_block_size\":%u,\"data_block_size\":%u,\"bitmap_granularity\":%u,\"entries\":[\n",
+            hdr->meta_block_size, hdr->data_block_size, hdr->bitmap_granularity
+        );
+    }
+    else
+    {
+        printf("{\"version\":\"0.5\",\"meta_block_size\":%lu,\"entries\":[\n", dsk.meta_block_size);
+    }
+    first = true;
+}
+
+void disk_tool_t::dump_meta_entry(uint64_t block_num, clean_disk_entry *entry, uint8_t *bitmap)
+{
+    printf(
+#define ENTRY_FMT "{\"block\":%lu,\"pool\":%u,\"inode\":%lu,\"stripe\":%lu,\"version\":%lu"
+        (first ? ENTRY_FMT : (",\n" ENTRY_FMT)),
+#undef ENTRY_FMT
+        block_num, INODE_POOL(entry->oid.inode), INODE_NO_POOL(entry->oid.inode),
+        entry->oid.stripe, entry->version
+    );
+    if (bitmap)
+    {
+        printf(",\"bitmap\":\"");
+        for (uint64_t i = 0; i < dsk.clean_entry_bitmap_size; i++)
+        {
+            printf("%02x", bitmap[i]);
+        }
+        printf("\",\"ext_bitmap\":\"");
+        for (uint64_t i = 0; i < dsk.clean_entry_bitmap_size; i++)
+        {
+            printf("%02x", bitmap[dsk.clean_entry_bitmap_size + i]);
+        }
+        printf("\"}");
+    }
+    else
+    {
+        printf("}");
+    }
+    first = false;
+}
+
+int disk_tool_t::write_json_meta(json11::Json meta)
+{
+    new_meta_buf = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, new_meta_len);
+    memset(new_meta_buf, 0, new_meta_len);
+    blockstore_meta_header_v1_t *new_hdr = (blockstore_meta_header_v1_t *)new_meta_buf;
+    new_hdr->zero = 0;
+    new_hdr->magic = BLOCKSTORE_META_MAGIC_V1;
+    new_hdr->version = BLOCKSTORE_META_VERSION_V1;
+    new_hdr->meta_block_size = meta["meta_block_size"].uint64_value()
+        ? meta["meta_block_size"].uint64_value() : 4096;
+    new_hdr->data_block_size = meta["data_block_size"].uint64_value()
+        ? meta["data_block_size"].uint64_value() : 131072;
+    new_hdr->bitmap_granularity = meta["bitmap_granularity"].uint64_value()
+        ? meta["bitmap_granularity"].uint64_value() : 4096;
+    new_clean_entry_bitmap_size = new_hdr->data_block_size / new_hdr->bitmap_granularity / 8;
+    new_clean_entry_size = sizeof(clean_disk_entry) + 2*new_clean_entry_bitmap_size;
+    new_entries_per_block = new_hdr->meta_block_size / new_clean_entry_size;
+    for (const auto & e: meta["entries"].array_items())
+    {
+        uint64_t data_block = e["block"].uint64_value();
+        uint64_t mb = 1 + data_block/new_entries_per_block;
+        if (mb >= new_meta_len/new_hdr->meta_block_size)
+        {
+            free(new_meta_buf);
+            new_meta_buf = NULL;
+            fprintf(stderr, "Metadata (data block %lu) doesn't fit into the new area\n", data_block);
+            return 1;
+        }
+        clean_disk_entry *new_entry = (clean_disk_entry*)(new_meta_buf +
+            new_hdr->meta_block_size*mb +
+            new_clean_entry_size*(data_block % new_entries_per_block));
+        new_entry->oid.inode = (sscanf_json(NULL, e["pool"]) << (64-POOL_ID_BITS)) | sscanf_json(NULL, e["inode"]);
+        new_entry->oid.stripe = sscanf_json(NULL, e["stripe"]);
+        new_entry->version = sscanf_json(NULL, e["version"]);
+        fromhexstr(e["bitmap"].string_value(), new_clean_entry_bitmap_size, ((uint8_t*)new_entry) + sizeof(clean_disk_entry));
+        fromhexstr(e["ext_bitmap"].string_value(), new_clean_entry_bitmap_size, ((uint8_t*)new_entry) + sizeof(clean_disk_entry) + new_clean_entry_bitmap_size);
+    }
+    int r = resize_write_new_meta();
+    free(new_meta_buf);
+    new_meta_buf = NULL;
+    return r;
+}
--- a/src/disk_tool_prepare.cpp
+++ b/src/disk_tool_prepare.cpp
@@ -0,0 +1,620 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#include "disk_tool.h"
+#include "str_util.h"
+#include "osd_id.h"
+
+int disk_tool_t::prepare_one(std::map<std::string, std::string> options, int is_hdd)
+{
+    static const char *allow_additional_params[] = {
+        "max_write_iodepth",
+        "max_write_iodepth",
+        "min_flusher_count",
+        "max_flusher_count",
+        "inmemory_metadata",
+        "inmemory_journal",
+        "journal_sector_buffer_count",
+        "journal_no_same_sector_overwrites",
+        "throttle_small_writes",
+        "throttle_target_iops",
+        "throttle_target_mbs",
+        "throttle_target_parallelism",
+        "throttle_threshold_us",
+    };
+    if (options.find("force") == options.end())
+    {
+        std::vector<std::string> all_devs = { options["data_device"], options["meta_device"], options["journal_device"] };
+        for (int i = 0; i < all_devs.size(); i++)
+        {
+            const auto & dev = all_devs[i];
+            if (dev == "")
+                continue;
+            if (dev.substr(0, 22) != "/dev/disk/by-partuuid/")
+            {
+                // Partitions should be identified by GPT partition UUID
+                fprintf(stderr, "%s does not start with /dev/disk/by-partuuid/. Partitions should be identified by GPT partition UUIDs\n", dev.c_str());
+                return 1;
+            }
+            std::string real_dev = realpath_str(dev, false);
+            if (real_dev == "")
+                return 1;
+            std::string parent_dev = get_parent_device(real_dev);
+            if (parent_dev == "")
+                return 1;
+            if (parent_dev == real_dev)
+            {
+                fprintf(stderr, "%s is not a partition, not creating OSD without --force\n", dev.c_str());
+                return 1;
+            }
+            if (i == 0 && is_hdd == -1)
+                is_hdd = trim(read_file("/sys/block/"+parent_dev+"/queue/rotational")) == "1";
+            std::string out;
+            if (shell_exec({ "wipefs", dev }, "", &out, NULL) != 0 || out != "")
+            {
+                fprintf(stderr, "%s contains data, not creating OSD without --force. wipefs shows:\n%s", dev.c_str(), out.c_str());
+                return 1;
+            }
+            json11::Json sb = read_osd_superblock(dev, false);
+            if (!sb.is_null())
+            {
+                fprintf(stderr, "%s already contains Vitastor OSD superblock, not creating OSD without --force\n", dev.c_str());
+                return 1;
+            }
+        }
+    }
+    for (auto dev: std::vector<std::string>{"data", "meta", "journal"})
+    {
+        if (options[dev+"_device"] != "" && options["disable_"+dev+"_fsync"] == "auto")
+        {
+            int r = disable_cache(realpath_str(options[dev+"_device"], false));
+            if (r != 0)
+            {
+                if (r == 1)
+                    fprintf(stderr, "Warning: disable_%s_fsync is auto, but cache status check failed. Leaving fsync on\n", dev.c_str());
+                options["disable_"+dev+"_fsync"] = "0";
+            }
+            else
+                options["disable_"+dev+"_fsync"] = "1";
+        }
+    }
+    if (options["meta_device"] == "" || options["meta_device"] == options["data_device"])
+    {
+        options["disable_meta_fsync"] = options["disable_data_fsync"];
+    }
+    if (options["journal_device"] == "" || options["journal_device"] == options["meta_device"])
+    {
+        options["disable_journal_fsync"] = options["disable_meta_fsync"];
+    }
+    else if (options["journal_device"] == options["data_device"])
+    {
+        options["disable_journal_fsync"] = options["disable_data_fsync"];
+    }
+    // Calculate offsets if the same device is used for two or more of data, meta, and journal
+    if (options["journal_size"] == "")
+    {
+        if (options["journal_device"] == "")
+            options["journal_size"] = "32M";
+        else if (is_hdd)
+            options["journal_size"] = DEFAULT_HYBRID_JOURNAL;
+    }
+    if (is_hdd)
+    {
+        if (options["block_size"] == "")
+            options["block_size"] = "1M";
+        if (options["throttle_small_writes"] == "")
+            options["throttle_small_writes"] = "1";
+    }
+    json11::Json::object sb;
+    blockstore_disk_t dsk;
+    try
+    {
+        dsk.parse_config(options);
+        dsk.open_data();
+        dsk.open_meta();
+        dsk.open_journal();
+        dsk.calc_lengths(true);
+        sb = json11::Json::object {
+            { "data_device", options["data_device"] },
+            { "meta_device", options["meta_device"] },
+            { "journal_device", options["journal_device"] },
+            { "block_size", (uint64_t)dsk.data_block_size },
+            { "meta_block_size", dsk.meta_block_size },
+            { "journal_block_size", dsk.journal_block_size },
+            { "data_size", dsk.cfg_data_size },
+            { "disk_alignment", (uint64_t)dsk.disk_alignment },
+            { "bitmap_granularity", dsk.bitmap_granularity },
+            { "disable_device_lock", dsk.disable_flock },
+            { "journal_offset", 4096 },
+            { "meta_offset", 4096 + (dsk.meta_device == dsk.journal_device ? dsk.journal_len : 0) },
+            { "data_offset", 4096 + (dsk.data_device == dsk.meta_device ? dsk.meta_len : 0) +
+                (dsk.data_device == dsk.journal_device ? dsk.journal_len : 0) },
+            { "journal_no_same_sector_overwrites", true },
+            { "journal_sector_buffer_count", 1024 },
+            { "disable_data_fsync", json_is_true(options["disable_data_fsync"]) },
+            { "disable_meta_fsync", json_is_true(options["disable_meta_fsync"]) },
+            { "disable_journal_fsync", json_is_true(options["disable_journal_fsync"]) },
+            { "skip_cache_check", json_is_true(options["skip_cache_check"]) },
+            { "immediate_commit", json_is_true(options["disable_data_fsync"])
+                ? (json_is_true(options["disable_journal_fsync"]) ? "all" : "small") : "none" },
+        };
+        for (int i = 0; i < sizeof(allow_additional_params)/sizeof(allow_additional_params[0]); i++)
+        {
+            auto it = options.find(allow_additional_params[i]);
+            if (it != options.end())
+            {
+                sb[it->first] = it->second;
+            }
+        }
+    }
+    catch (std::exception & e)
+    {
+        dsk.close_all();
+        fprintf(stderr, "%s\n", e.what());
+        return 1;
+    }
+    std::string osd_num_str;
+    if (shell_exec({ "vitastor-cli", "alloc-osd" }, "", &osd_num_str, NULL) != 0)
+    {
+        dsk.close_all();
+        return 1;
+    }
+    osd_num_t osd_num = stoull_full(trim(osd_num_str), 10);
+    if (!osd_num)
+    {
+        dsk.close_all();
+        fprintf(stderr, "Could not create OSD. vitastor-cli alloc-osd didn't return a valid OSD number:\n%s", osd_num_str.c_str());
+        return 1;
+    }
+    sb["osd_num"] = osd_num;
+    // Zero out metadata and journal
+    if (write_zero(dsk.meta_fd, dsk.meta_offset, dsk.meta_len) != 0 ||
+        write_zero(dsk.journal_fd, dsk.journal_offset, dsk.journal_len) != 0)
+    {
+        fprintf(stderr, "Failed to zero out metadata or journal: %s\n", strerror(errno));
+        dsk.close_all();
+        return 1;
+    }
+    dsk.close_all();
+    // Write superblocks
+    bool sep_m = options["meta_device"] != "" &&
+        options["meta_device"] != options["data_device"];
+    bool sep_j = options["journal_device"] != "" &&
+        options["journal_device"] != options["data_device"] &&
+        options["journal_device"] != options["meta_device"];
+    if (!write_osd_superblock(options["data_device"], sb) ||
+        sep_m && !write_osd_superblock(options["meta_device"], sb) ||
+        sep_j && !write_osd_superblock(options["journal_device"], sb))
+    {
+        return 1;
+    }
+    auto desc = realpath_str(options["data_device"]);
+    if (sep_m)
+        desc += " with metadata on "+realpath_str(options["meta_device"]);
+    if (sep_j)
+        desc += (sep_m ? " and journal on " : " with journal on ") + realpath_str(options["journal_device"]);
+    fprintf(stderr, "Initialized OSD %lu on %s\n", osd_num, desc.c_str());
+    if (shell_exec({ "systemctl", "enable", "--now", "vitastor-osd@"+std::to_string(osd_num) }, "", NULL, NULL) != 0)
+    {
+        fprintf(stderr, "Failed to enable systemd unit vitastor-osd@%lu\n", osd_num);
+        return 1;
+    }
+    return 0;
+}
+
+std::vector<vitastor_dev_info_t> disk_tool_t::collect_devices(const std::vector<std::string> & devices)
+{
+    std::vector<vitastor_dev_info_t> devinfo;
+    for (auto & dev: devices)
+    {
+        // Check if the device is a whole disk
+        if (dev.substr(0, 5) != "/dev/")
+        {
+            fprintf(stderr, "%s does not start with /dev/, ignoring\n", dev.c_str());
+            continue;
+        }
+        struct stat dev_st, sys_st;
+        if (stat(dev.c_str(), &dev_st) < 0)
+        {
+            if (errno == ENOENT)
+            {
+                fprintf(stderr, "%s does not exist, skipping\n", dev.c_str());
+                continue;
+            }
+            fprintf(stderr, "Error checking %s: %s\n", dev.c_str(), strerror(errno));
+            return {};
+        }
+        uint64_t dev_size = dev_st.st_size;
+        if (S_ISBLK(dev_st.st_mode))
+        {
+            int fd = open(dev.c_str(), O_DIRECT|O_RDWR);
+            if (fd < 0)
+            {
+                fprintf(stderr, "Failed to open %s: %s\n", dev.c_str(), strerror(errno));
+                return {};
+            }
+            if (ioctl(fd, BLKGETSIZE64, &dev_size) < 0)
+            {
+                fprintf(stderr, "Failed to get %s size: %s\n", dev.c_str(), strerror(errno));
+                close(fd);
+                return {};
+            }
+            close(fd);
+        }
+        if (stat(("/sys/block/"+dev.substr(5)).c_str(), &sys_st) < 0)
+        {
+            if (errno == ENOENT)
+            {
+                fprintf(stderr, "%s is probably a partition (no entry in /sys/block/), ignoring\n", dev.c_str());
+                continue;
+            }
+            fprintf(stderr, "Error checking /sys/block/%s: %s\n", dev.c_str()+5, strerror(errno));
+            return {};
+        }
+        // Check if the device is an SSD
+        bool is_hdd = trim(read_file("/sys/block/"+dev.substr(5)+"/queue/rotational")) == "1";
+        // Check if it has a partition table
+        json11::Json pt = read_parttable(dev);
+        if (pt.is_bool() && !pt.bool_value())
+        {
+            // Error reading table
+            return {};
+        }
+        if (pt.is_null())
+        {
+            // No partition table
+            std::string out;
+            int r = shell_exec({ "wipefs", dev }, "", &out, NULL);
+            if (r != 0 || out != "")
+            {
+                fprintf(stderr, "%s contains data, skipping:\n  %s\n", dev.c_str(), str_replace(trim(out), "\n", "\n  ").c_str());
+                continue;
+            }
+        }
+        int osds = 0;
+        for (const auto & p: pt["partitions"].array_items())
+            if (strtolower(p["type"].string_value()) == VITASTOR_PART_TYPE)
+                osds++;
+        devinfo.push_back((vitastor_dev_info_t){
+            .path = dev,
+            .is_hdd = is_hdd,
+            .pt = pt,
+            .osd_part_count = osds,
+            .size = !pt.is_null() ? dev_size_from_parttable(pt) : dev_size,
+            .free = !pt.is_null() ? free_from_parttable(pt) : dev_size,
+        });
+    }
+    if (!devinfo.size())
+    {
+        fprintf(stderr, "No suitable devices found\n");
+    }
+    return devinfo;
+}
+
+// Return null in case of an error
+json11::Json disk_tool_t::add_partitions(vitastor_dev_info_t & devinfo, std::vector<std::string> sizes)
+{
+    std::string script = "label: gpt\n\n";
+    std::set<std::string> is_old;
+    for (auto part: devinfo.pt["partitions"].array_items())
+    {
+        // Old partitions
+        is_old.insert(part["uuid"].string_value());
+        script += part["node"].string_value()+": ";
+        int n = 0;
+        for (auto & kv: part.object_items())
+        {
+            if (kv.first != "node")
+            {
+                if (n++)
+                    script += ", ";
+                script += kv.first+"="+(kv.second.is_string() ? kv.second.string_value() : kv.second.dump());
+            }
+        }
+        script += "\n";
+    }
+    for (auto size: sizes)
+    {
+        script += "+ "+size+" "+std::string(VITASTOR_PART_TYPE)+"\n";
+    }
+    if (shell_exec({ "sfdisk", "--force", devinfo.path }, script, NULL, NULL) != 0)
+    {
+        fprintf(stderr, "Failed to add %lu partition(s) with sfdisk\n", sizes.size());
+        return {};
+    }
+    // Get new partition table and find created partitions
+    json11::Json newpt = read_parttable(devinfo.path);
+    json11::Json::array new_parts;
+    for (const auto & part: newpt["partitions"].array_items())
+    {
+        if (is_old.find(part["uuid"].string_value()) == is_old.end())
+        {
+            new_parts.push_back(part);
+        }
+    }
+    if (new_parts.size() != sizes.size())
+    {
+        fprintf(stderr, "Failed to add %lu partition(s) with sfdisk: new partitions not found in table\n", sizes.size());
+        return {};
+    }
+    // Check if new nodes exist and run partprobe if not
+    // FIXME: We could use parted instead of sfdisk because partprobe is already a part of parted
+    int iter = 0, r;
+    while (true)
+    {
+        for (const auto & part: new_parts)
+        {
+            struct stat st;
+            if (stat(part["node"].string_value().c_str(), &st) < 0)
+            {
+                if (errno == ENOENT)
+                {
+                    iter++;
+                    // Run partprobe
+                    if (iter > 1 || (r = shell_exec({ "partprobe", devinfo.path }, "", NULL, NULL)) != 0)
+                    {
+                        fprintf(
+                            stderr, iter == 1 && r == 255
+                                ? "partprobe utility is required to reread partition table while disk %s is in use\n"
+                                : "partprobe failed to re-read partition table while disk %s is in use\n",
+                            devinfo.path.c_str()
+                        );
+                        return {};
+                    }
+                    break;
+                }
+                else
+                {
+                    fprintf(stderr, "Failed to lstat %s: %s\n", part["node"].string_value().c_str(), strerror(errno));
+                    return {};
+                }
+            }
+        }
+        break;
+    }
+    // Wait until device symlinks in /dev/disk/by-partuuid/ appear
+    bool exists = false;
+    iter = 0;
+    while (!exists && iter < 300) // max 30 sec
+    {
+        exists = true;
+        for (const auto & part: new_parts)
+        {
+            std::string link_path = "/dev/disk/by-partuuid/"+strtolower(part["uuid"].string_value());
+            struct stat st;
+            if (lstat(link_path.c_str(), &st) < 0)
+            {
+                if (errno == ENOENT)
+                    exists = false;
+                else
+                {
+                    fprintf(stderr, "Failed to lstat %s: %s\n", link_path.c_str(), strerror(errno));
+                    return {};
+                }
+            }
+        }
+        if (!exists)
+        {
+            struct timespec ts = { .tv_sec = 0, .tv_nsec = 100000000 }; // 100ms
+            iter += (nanosleep(&ts, NULL) == 0);
+        }
+    }
+    devinfo.pt = newpt;
+    devinfo.osd_part_count += sizes.size();
+    devinfo.free = free_from_parttable(newpt);
+    return new_parts;
+}
+
+std::vector<std::string> disk_tool_t::get_new_data_parts(vitastor_dev_info_t & dev,
+    uint64_t osd_per_disk, uint64_t max_other_percent)
+{
+    std::vector<std::string> use_parts;
+    uint64_t want_parts = 0;
+    if (dev.pt.is_null())
+    {
+        want_parts = osd_per_disk;
+    }
+    else
+    {
+        // Disk already has partitions. If these are empty Vitastor OSD partitions, we can use them
+        uint64_t osds_exist = 0, osds_size = 0;
+        for (const auto & part: dev.pt["partitions"].array_items())
+        {
+            if (strtolower(part["type"].string_value()) == VITASTOR_PART_TYPE)
+            {
+                // Check if an existing Vitastor partition is empty
+                json11::Json sb = read_osd_superblock(part["node"].string_value(), false);
+                if (sb.is_null())
+                {
+                    // Use this partition
+                    use_parts.push_back(part["uuid"].string_value());
+                }
+                else
+                {
+                    fprintf(
+                        stderr, "%s is already initialized for OSD %lu, skipping\n",
+                        part["node"].string_value().c_str(), sb["params"]["osd_num"].uint64_value()
+                    );
+                    osds_size += part["size"].uint64_value()*dev.pt["sectorsize"].uint64_value();
+                }
+                osds_exist++;
+            }
+        }
+        // Still create OSD(s) if a disk has no more than (max_other_percent) other data
+        if (osds_exist >= osd_per_disk || (dev.free+osds_size) < dev.size*(100-max_other_percent)/100)
+            fprintf(stderr, "%s is already partitioned, skipping\n", dev.path.c_str());
+        else
+            want_parts = osd_per_disk-osds_exist;
+    }
+    if (want_parts > 0)
+    {
+        // Disk is not partitioned yet - create OSD partition(s)
+        std::vector<std::string> sizes;
+        auto each_size = std::to_string((dev.free - 1048576) / 1048576 / want_parts)+"MiB";
+        for (uint64_t i = 0; i < want_parts-1; i++)
+            sizes.push_back(each_size);
+        sizes.push_back("+");
+        auto new_parts = add_partitions(dev, sizes);
+        for (const auto & part: new_parts.array_items())
+            use_parts.push_back(part["uuid"].string_value());
+    }
+    return use_parts;
+}
+
+int disk_tool_t::get_meta_partition(std::vector<vitastor_dev_info_t> & ssds, std::map<std::string, std::string> & options)
+{
+    uint64_t journal_size = parse_size(options["journal_size"]);
+    journal_size = ((journal_size+1024*1024-1)/1024/1024)*1024*1024;
+    // Calculate metadata size
+    uint64_t meta_size = 0;
+    try
+    {
+        blockstore_disk_t dsk;
+        dsk.parse_config(options);
+        dsk.open_data();
+        dsk.open_meta();
+        dsk.open_journal();
+        dsk.calc_lengths(true);
+        dsk.close_all();
+        meta_size = dsk.meta_len;
+    }
+    catch (std::exception & e)
+    {
+        fprintf(stderr, "%s\n", e.what());
+        return 1;
+    }
+    // Leave some extra space for future metadata formats and round metadata area size to multiples of 1 MB
+    uint64_t meta_reserve_multiple = 2, min_meta_size = (uint64_t)1024*1024*1024;
+    if (options.find("meta_reserve") != options.end())
+    {
+        int p1 = options["meta_reserve"].find("x"), p2 = options["meta_reserve"].find(",");
+        if (p1 >= 0 && p2 >= 0)
+        {
+            meta_reserve_multiple = stoull_full(options["meta_reserve"].substr(p1 < p2 ? 0 : p2, p1 - (p1 < p2 ? 0 : p2)));
+            min_meta_size = parse_size(options["meta_reserve"].substr(p1 < p2 ? p2 : 0, p1 < p2 ? options["meta_reserve"].size()-p2 : p2));
+        }
+        else if (p1 >= 0)
+            meta_reserve_multiple = stoull_full(options["meta_reserve"].substr(0, p1));
+        else
+            min_meta_size = parse_size(options["meta_reserve"]);
+    }
+    meta_size = ((meta_size+1024*1024-1)/1024/1024)*1024*1024;
+    meta_size *= meta_reserve_multiple;
+    if (meta_size < min_meta_size)
+        meta_size = min_meta_size;
+    // Pick an SSD for journal&meta, balancing the number of serviced OSDs across SSDs
+    int sel = -1;
+    for (int i = 0; i < ssds.size(); i++)
+        if (ssds[i].free >= (meta_size+journal_size+4096*2) && (sel == -1 || ssds[sel].osd_part_count > ssds[i].osd_part_count))
+            sel = i;
+    if (sel < 0)
+    {
+        fprintf(
+            stderr, "Could not find free space for new SSD journal and metadata (need %lu + %lu MiB)\n",
+            meta_size/1024/1024, journal_size/1024/1024
+        );
+        return 1;
+    }
+    // Create partitions
+    auto new_parts = add_partitions(ssds[sel], {
+        std::to_string(journal_size/1024/1024)+"MiB",
+        std::to_string(meta_size/1024/1024)+"MiB"
+    });
+    if (new_parts.is_null())
+    {
+        return 1;
+    }
+    ssds[sel].osd_part_count += 2;
+    options["journal_device"] = "/dev/disk/by-partuuid/"+strtolower(new_parts[0]["uuid"].string_value());
+    options["meta_device"] = "/dev/disk/by-partuuid/"+strtolower(new_parts[1]["uuid"].string_value());
+    return 0;
+}
+
+int disk_tool_t::prepare(std::vector<std::string> devices)
+{
+    if (options.find("data_device") != options.end() && options["data_device"] != "")
+    {
+        if (options.find("hybrid") != options.end() || options.find("osd_per_disk") != options.end() || devices.size())
+        {
+            fprintf(stderr, "Device list (positional arguments) and --hybrid are incompatible with --data_device\n");
+            return 1;
+        }
+        return prepare_one(options);
+    }
+    if (!devices.size())
+    {
+        fprintf(stderr, "Device list missing\n");
+        return 1;
+    }
+    options.erase("data_device");
+    options.erase("meta_device");
+    options.erase("journal_device");
+    auto devinfo = collect_devices(devices);
+    if (!devinfo.size())
+    {
+        return 1;
+    }
+    bool hybrid = options.find("hybrid") != options.end();
+    uint64_t osd_per_disk = stoull_full(options["osd_per_disk"]);
+    if (!osd_per_disk)
+        osd_per_disk = 1;
+    uint64_t max_other_percent = 10;
+    if (options.find("max_other") != options.end())
+    {
+        max_other_percent = stoull_full(trim(options["max_other"], " \n\r\t%"));
+        if (max_other_percent > 100)
+            max_other_percent = 100;
+    }
+    std::vector<vitastor_dev_info_t> ssds;
+    if (options.find("disable_data_fsync") == options.end())
+        options["disable_data_fsync"] = "auto";
+    if (hybrid)
+    {
+        if (options.find("disable_meta_fsync") == options.end())
+            options["disable_meta_fsync"] = "auto";
+        options["disable_journal_fsync"] = options["disable_meta_fsync"];
+        for (auto & dev: devinfo)
+            if (!dev.is_hdd)
+                ssds.push_back(dev);
+        if (!ssds.size())
+        {
+            fprintf(stderr, "No SSDs found\n");
+            return 1;
+        }
+        else if (ssds.size() == devinfo.size())
+        {
+            fprintf(stderr, "No HDDs found\n");
+            return 1;
+        }
+        if (options["journal_size"] == "")
+            options["journal_size"] = DEFAULT_HYBRID_JOURNAL;
+    }
+    else
+    {
+        options.erase("disable_meta_fsync");
+        options.erase("disable_journal_fsync");
+    }
+    for (auto & dev: devinfo)
+    {
+        if (!hybrid || dev.is_hdd)
+        {
+            // Select new partitions and create an OSD on each of them
+            for (const auto & uuid: get_new_data_parts(dev, osd_per_disk, max_other_percent))
+            {
+                options["force"] = true;
+                options["data_device"] = "/dev/disk/by-partuuid/"+strtolower(uuid);
+                if (hybrid)
+                {
+                    // Select/create journal and metadata partitions
+                    int r = get_meta_partition(ssds, options);
+                    if (r != 0)
+                    {
+                        return 1;
+                    }
+                }
+                prepare_one(options, dev.is_hdd ? 1 : 0);
+            }
+        }
+    }
+    return 0;
+}
--- a/src/disk_tool_resize.cpp
+++ b/src/disk_tool_resize.cpp
@@ -0,0 +1,496 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#include "disk_tool.h"
+#include "rw_blocking.h"
+#include "str_util.h"
+
+#define DM_ST_EMPTY 0
+#define DM_ST_TO_READ 1
+#define DM_ST_READING 2
+#define DM_ST_TO_WRITE 3
+#define DM_ST_WRITING 4
+
+struct resizer_data_moving_t
+{
+    int state = 0;
+    void *buf = NULL;
+    uint64_t old_loc, new_loc;
+};
+
+int disk_tool_t::resize_data()
+{
+    int r;
+    // Parse parameters
+    r = resize_parse_params();
+    if (r != 0)
+        return r;
+    // Check parameters and fill allocator
+    fprintf(stderr, "Reading metadata\n");
+    data_alloc = new allocator((new_data_len < dsk.data_len ? dsk.data_len : new_data_len) / dsk.data_block_size);
+    r = process_meta(
+        [this](blockstore_meta_header_v1_t *hdr)
+        {
+            resize_init(hdr);
+        },
+        [this](uint64_t block_num, clean_disk_entry *entry, uint8_t *bitmap)
+        {
+            data_alloc->set(block_num, true);
+        }
+    );
+    if (r != 0)
+        return r;
+    fprintf(stderr, "Reading journal\n");
+    r = process_journal([this](void *buf)
+    {
+        return process_journal_block(buf, [this](int num, journal_entry *je)
+        {
+            if (je->type == JE_BIG_WRITE || je->type == JE_BIG_WRITE_INSTANT)
+            {
+                data_alloc->set(je->big_write.location / dsk.data_block_size, true);
+            }
+        });
+    });
+    if (r != 0)
+        return r;
+    // Remap blocks
+    r = resize_remap_blocks();
+    if (r != 0)
+        return r;
+    // Copy data blocks into new places
+    fprintf(stderr, "Moving data blocks\n");
+    r = resize_copy_data();
+    if (r != 0)
+        return r;
+    // Rewrite journal
+    fprintf(stderr, "Rebuilding journal\n");
+    r = resize_rewrite_journal();
+    if (r != 0)
+        return r;
+    // Rewrite metadata
+    fprintf(stderr, "Rebuilding metadata\n");
+    r = resize_rewrite_meta();
+    if (r != 0)
+        return r;
+    // Write new journal
+    fprintf(stderr, "Writing new journal\n");
+    r = resize_write_new_journal();
+    if (r != 0)
+        return r;
+    // Write new metadata
+    fprintf(stderr, "Writing new metadata\n");
+    r = resize_write_new_meta();
+    if (r != 0)
+        return r;
+    fprintf(stderr, "Done\n");
+    return 0;
+}
+
+int disk_tool_t::resize_parse_params()
+{
+    try
+    {
+        dsk.parse_config(options);
+        dsk.open_data();
+        dsk.open_meta();
+        dsk.open_journal();
+        dsk.calc_lengths();
+        dsk.close_all();
+    }
+    catch (std::exception & e)
+    {
+        dsk.close_all();
+        fprintf(stderr, "Error: %s\n", e.what());
+        return 1;
+    }
+    iodepth = strtoull(options["iodepth"].c_str(), NULL, 10);
+    if (!iodepth)
+        iodepth = 32;
+    new_meta_device = options.find("new_meta_device") != options.end()
+        ? options["new_meta_device"] : dsk.meta_device;
+    new_journal_device = options.find("new_journal_device") != options.end()
+        ? options["new_journal_device"] : dsk.journal_device;
+    new_data_offset = options.find("new_data_offset") != options.end()
+        ? parse_size(options["new_data_offset"]) : dsk.data_offset;
+    new_data_len = options.find("new_data_len") != options.end()
+        ? parse_size(options["new_data_len"]) : dsk.data_len;
+    new_meta_offset = options.find("new_meta_offset") != options.end()
+        ? parse_size(options["new_meta_offset"]) : dsk.meta_offset;
+    new_meta_len = options.find("new_meta_len") != options.end()
+        ? parse_size(options["new_meta_len"]) : 0; // will be calculated in resize_init()
+    new_journal_offset = options.find("new_journal_offset") != options.end()
+        ? parse_size(options["new_journal_offset"]) : dsk.journal_offset;
+    new_journal_len = options.find("new_journal_len") != options.end()
+        ? parse_size(options["new_journal_len"]) : dsk.journal_len;
+    if (new_meta_device == dsk.meta_device &&
+        new_journal_device == dsk.journal_device &&
+        new_data_offset == dsk.data_offset &&
+        new_data_len == dsk.data_len &&
+        new_meta_offset == dsk.meta_offset &&
+        (new_meta_len == dsk.meta_len || new_meta_len == 0) &&
+        new_journal_offset == dsk.journal_offset &&
+        new_journal_len == dsk.journal_len &&
+        options.find("force") == options.end())
+    {
+        // No difference
+        fprintf(stderr, "No difference, specify --force to rewrite journal and meta anyway\n");
+        return 1;
+    }
+    return 0;
+}
+
+void disk_tool_t::resize_init(blockstore_meta_header_v1_t *hdr)
+{
+    if (hdr && dsk.data_block_size != hdr->data_block_size)
+    {
+        if (dsk.data_block_size)
+        {
+            fprintf(stderr, "Using data block size of %u bytes from metadata superblock\n", hdr->data_block_size);
+        }
+        dsk.data_block_size = hdr->data_block_size;
+    }
+    if (((new_data_len-dsk.data_len) % dsk.data_block_size) ||
+        ((new_data_offset-dsk.data_offset) % dsk.data_block_size))
+    {
+        fprintf(stderr, "Data alignment mismatch\n");
+        exit(1);
+    }
+    data_idx_diff = ((int64_t)(dsk.data_offset-new_data_offset)) / dsk.data_block_size;
+    free_first = new_data_offset > dsk.data_offset ? (new_data_offset-dsk.data_offset) / dsk.data_block_size : 0;
+    free_last = (new_data_offset+new_data_len < dsk.data_offset+dsk.data_len)
+        ? (dsk.data_offset+dsk.data_len-new_data_offset-new_data_len) / dsk.data_block_size
+        : 0;
+    new_clean_entry_bitmap_size = dsk.data_block_size / (hdr ? hdr->bitmap_granularity : 4096) / 8;
+    new_clean_entry_size = sizeof(clean_disk_entry) + 2 * new_clean_entry_bitmap_size;
+    new_entries_per_block = dsk.meta_block_size/new_clean_entry_size;
+    uint64_t new_meta_blocks = 1 + (new_data_len/dsk.data_block_size + new_entries_per_block-1) / new_entries_per_block;
+    if (!new_meta_len)
+    {
+        new_meta_len = dsk.meta_block_size*new_meta_blocks;
+    }
+    if (new_meta_len < dsk.meta_block_size*new_meta_blocks)
+    {
+        fprintf(stderr, "New metadata area size is too small, should be at least %lu bytes\n", dsk.meta_block_size*new_meta_blocks);
+        exit(1);
+    }
+    // Check that new metadata, journal and data areas don't overlap
+    if (new_meta_device == dsk.data_device && new_meta_offset < new_data_offset+new_data_len &&
+        new_meta_offset+new_meta_len > new_data_offset)
+    {
+        fprintf(stderr, "New metadata area overlaps with data\n");
+        exit(1);
+    }
+    if (new_journal_device == dsk.data_device && new_journal_offset < new_data_offset+new_data_len &&
+        new_journal_offset+new_journal_len > new_data_offset)
+    {
+        fprintf(stderr, "New journal area overlaps with data\n");
+        exit(1);
+    }
+    if (new_journal_device == new_meta_device && new_journal_offset < new_meta_offset+new_meta_len &&
+        new_journal_offset+new_journal_len > new_meta_offset)
+    {
+        fprintf(stderr, "New journal area overlaps with metadata\n");
+        exit(1);
+    }
+}
+
+int disk_tool_t::resize_remap_blocks()
+{
+    total_blocks = dsk.data_len / dsk.data_block_size;
+    for (uint64_t i = 0; i < free_first; i++)
+    {
+        if (data_alloc->get(i))
+            data_remap[i] = 0;
+        else
+            data_alloc->set(i, true);
+    }
+    for (uint64_t i = 0; i < free_last; i++)
+    {
+        if (data_alloc->get(total_blocks-i))
+            data_remap[total_blocks-i] = 0;
+        else
+            data_alloc->set(total_blocks-i, true);
+    }
+    for (auto & p: data_remap)
+    {
+        uint64_t new_loc = data_alloc->find_free();
+        if (new_loc == UINT64_MAX)
+        {
+            fprintf(stderr, "Not enough space to move data\n");
+            return 1;
+        }
+        data_alloc->set(new_loc, true);
+        data_remap[p.first] = new_loc;
+    }
+    return 0;
+}
+
+int disk_tool_t::resize_copy_data()
+{
+    if (iodepth <= 0 || iodepth > 4096)
+    {
+        iodepth = 32;
+    }
+    ringloop = new ring_loop_t(iodepth < 512 ? 512 : iodepth);
+    dsk.data_fd = open(dsk.data_device.c_str(), O_DIRECT|O_RDWR);
+    if (dsk.data_fd < 0)
+    {
+        fprintf(stderr, "Failed to open data device %s: %s\n", dsk.data_device.c_str(), strerror(errno));
+        delete ringloop;
+        ringloop = NULL;
+        return 1;
+    }
+    moving_blocks = new resizer_data_moving_t[iodepth];
+    moving_blocks[0].buf = memalign_or_die(MEM_ALIGNMENT, iodepth*dsk.data_block_size);
+    for (int i = 1; i < iodepth; i++)
+    {
+        moving_blocks[i].buf = (uint8_t*)moving_blocks[0].buf + i*dsk.data_block_size;
+    }
+    remap_active = 1;
+    remap_it = data_remap.begin();
+    ring_consumer.loop = [this]()
+    {
+        remap_active = 0;
+        for (int i = 0; i < iodepth; i++)
+        {
+            if (moving_blocks[i].state == DM_ST_EMPTY && remap_it != data_remap.end())
+            {
+                uint64_t old_loc = remap_it->first, new_loc = remap_it->second;
+                moving_blocks[i].state = DM_ST_TO_READ;
+                moving_blocks[i].old_loc = old_loc;
+                moving_blocks[i].new_loc = new_loc;
+                remap_it++;
+            }
+            if (moving_blocks[i].state == DM_ST_TO_READ)
+            {
+                struct io_uring_sqe *sqe = ringloop->get_sqe();
+                if (sqe)
+                {
+                    moving_blocks[i].state = DM_ST_READING;
+                    struct ring_data_t *data = ((ring_data_t*)sqe->user_data);
+                    data->iov = (struct iovec){ moving_blocks[i].buf, dsk.data_block_size };
+                    my_uring_prep_readv(sqe, dsk.data_fd, &data->iov, 1, dsk.data_offset + moving_blocks[i].old_loc*dsk.data_block_size);
+                    data->callback = [this, i](ring_data_t *data)
+                    {
+                        if (data->res != dsk.data_block_size)
+                        {
+                            fprintf(
+                                stderr, "Failed to read %u bytes at %lu from %s: %s\n", dsk.data_block_size,
+                                dsk.data_offset + moving_blocks[i].old_loc*dsk.data_block_size, dsk.data_device.c_str(),
+                                data->res < 0 ? strerror(-data->res) : "short read"
+                            );
+                            exit(1);
+                        }
+                        moving_blocks[i].state = DM_ST_TO_WRITE;
+                        ringloop->wakeup();
+                    };
+                 }
+            }
+            if (moving_blocks[i].state == DM_ST_TO_WRITE)
+            {
+                struct io_uring_sqe *sqe = ringloop->get_sqe();
+                if (sqe)
+                {
+                    moving_blocks[i].state = DM_ST_WRITING;
+                    struct ring_data_t *data = ((ring_data_t*)sqe->user_data);
+                    data->iov = (struct iovec){ moving_blocks[i].buf, dsk.data_block_size };
+                    my_uring_prep_writev(sqe, dsk.data_fd, &data->iov, 1, dsk.data_offset + moving_blocks[i].new_loc*dsk.data_block_size);
+                    data->callback = [this, i](ring_data_t *data)
+                    {
+                        if (data->res != dsk.data_block_size)
+                        {
+                            fprintf(
+                                stderr, "Failed to write %u bytes at %lu to %s: %s\n", dsk.data_block_size,
+                                dsk.data_offset + moving_blocks[i].new_loc*dsk.data_block_size, dsk.data_device.c_str(),
+                                data->res < 0 ? strerror(-data->res) : "short write"
+                            );
+                            exit(1);
+                        }
+                        moving_blocks[i].state = DM_ST_EMPTY;
+                        ringloop->wakeup();
+                    };
+                }
+            }
+            remap_active += moving_blocks[i].state != DM_ST_EMPTY ? 1 : 0;
+        }
+        ringloop->submit();
+    };
+    ringloop->register_consumer(&ring_consumer);
+    while (1)
+    {
+        ringloop->loop();
+        if (!remap_active)
+            break;
+        ringloop->wait();
+    }
+    ringloop->unregister_consumer(&ring_consumer);
+    free(moving_blocks[0].buf);
+    delete[] moving_blocks;
+    moving_blocks = NULL;
+    close(dsk.data_fd);
+    dsk.data_fd = -1;
+    delete ringloop;
+    ringloop = NULL;
+    return 0;
+}
+
+int disk_tool_t::resize_rewrite_journal()
+{
+    // Simply overwriting on the fly may be impossible because old and new areas may overlap
+    // For now, just build new journal data in memory
+    new_journal_buf = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, new_journal_len);
+    new_journal_ptr = new_journal_buf;
+    new_journal_data = new_journal_ptr + dsk.journal_block_size;
+    new_journal_in_pos = 0;
+    memset(new_journal_buf, 0, new_journal_len);
+    process_journal([this](void *buf)
+    {
+        return process_journal_block(buf, [this](int num, journal_entry *je)
+        {
+            if (je->type == JE_START)
+            {
+                journal_entry *ne = (journal_entry*)(new_journal_ptr + new_journal_in_pos);
+                *((journal_entry_start*)ne) = (journal_entry_start){
+                    .magic = JOURNAL_MAGIC,
+                    .type = JE_START,
+                    .size = sizeof(journal_entry_start),
+                    .journal_start = dsk.journal_block_size,
+                    .version = JOURNAL_VERSION,
+                };
+                ne->crc32 = je_crc32(ne);
+                new_journal_ptr += dsk.journal_block_size;
+                new_journal_data = new_journal_ptr+dsk.journal_block_size;
+                new_journal_in_pos = 0;
+            }
+            else
+            {
+                if (dsk.journal_block_size < new_journal_in_pos+je->size)
+                {
+                    new_journal_ptr = new_journal_data;
+                    if (new_journal_ptr-new_journal_buf >= new_journal_len)
+                    {
+                        fprintf(stderr, "Error: live entries don't fit to the new journal\n");
+                        exit(1);
+                    }
+                    new_journal_data = new_journal_ptr+dsk.journal_block_size;
+                    new_journal_in_pos = 0;
+                    if (dsk.journal_block_size < je->size)
+                    {
+                        fprintf(stderr, "Error: journal entry too large (%u bytes)\n", je->size);
+                        exit(1);
+                    }
+                }
+                journal_entry *ne = (journal_entry*)(new_journal_ptr + new_journal_in_pos);
+                memcpy(ne, je, je->size);
+                ne->crc32_prev = new_crc32_prev;
+                if (je->type == JE_BIG_WRITE || je->type == JE_BIG_WRITE_INSTANT)
+                {
+                    // Change the block reference
+                    auto remap_it = data_remap.find(ne->big_write.location / dsk.data_block_size);
+                    if (remap_it != data_remap.end())
+                    {
+                        ne->big_write.location = remap_it->second * dsk.data_block_size;
+                    }
+                    ne->big_write.location += data_idx_diff * dsk.data_block_size;
+                }
+                else if (je->type == JE_SMALL_WRITE || je->type == JE_SMALL_WRITE_INSTANT)
+                {
+                    ne->small_write.data_offset = new_journal_data-new_journal_buf;
+                    if (ne->small_write.data_offset + ne->small_write.len > new_journal_len)
+                    {
+                        fprintf(stderr, "Error: live entries don't fit to the new journal\n");
+                        exit(1);
+                    }
+                    memcpy(new_journal_data, small_write_data, ne->small_write.len);
+                    new_journal_data += ne->small_write.len;
+                }
+                ne->crc32 = je_crc32(ne);
+                new_journal_in_pos += ne->size;
+                new_crc32_prev = ne->crc32;
+            }
+        });
+    });
+    return 0;
+}
+
+int disk_tool_t::resize_write_new_journal()
+{
+    new_journal_fd = open(new_journal_device.c_str(), O_DIRECT|O_RDWR);
+    if (new_journal_fd < 0)
+    {
+        fprintf(stderr, "Failed to open new journal device %s: %s\n", new_journal_device.c_str(), strerror(errno));
+        return 1;
+    }
+    lseek64(new_journal_fd, new_journal_offset, 0);
+    write_blocking(new_journal_fd, new_journal_buf, new_journal_len);
+    fsync(new_journal_fd);
+    close(new_journal_fd);
+    new_journal_fd = -1;
+    free(new_journal_buf);
+    new_journal_buf = NULL;
+    return 0;
+}
+
+int disk_tool_t::resize_rewrite_meta()
+{
+    new_meta_buf = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, new_meta_len);
+    memset(new_meta_buf, 0, new_meta_len);
+    int r = process_meta(
+        [this](blockstore_meta_header_v1_t *hdr)
+        {
+            blockstore_meta_header_v1_t *new_hdr = (blockstore_meta_header_v1_t *)new_meta_buf;
+            new_hdr->zero = 0;
+            new_hdr->magic = BLOCKSTORE_META_MAGIC_V1;
+            new_hdr->version = BLOCKSTORE_META_VERSION_V1;
+            new_hdr->meta_block_size = dsk.meta_block_size;
+            new_hdr->data_block_size = dsk.data_block_size;
+            new_hdr->bitmap_granularity = dsk.bitmap_granularity ? dsk.bitmap_granularity : 4096;
+        },
+        [this](uint64_t block_num, clean_disk_entry *entry, uint8_t *bitmap)
+        {
+            auto remap_it = data_remap.find(block_num);
+            if (remap_it != data_remap.end())
+                block_num = remap_it->second;
+            if (block_num < free_first || block_num >= total_blocks-free_last)
+            {
+                fprintf(stderr, "BUG: remapped block not in range\n");
+                exit(1);
+            }
+            block_num += data_idx_diff;
+            clean_disk_entry *new_entry = (clean_disk_entry*)(new_meta_buf + dsk.meta_block_size +
+                dsk.meta_block_size*(block_num / new_entries_per_block) +
+                new_clean_entry_size*(block_num % new_entries_per_block));
+            new_entry->oid = entry->oid;
+            new_entry->version = entry->version;
+            if (bitmap)
+                memcpy(new_entry->bitmap, bitmap, 2*new_clean_entry_bitmap_size);
+            else
+                memset(new_entry->bitmap, 0xff, 2*new_clean_entry_bitmap_size);
+        }
+    );
+    if (r != 0)
+    {
+        free(new_meta_buf);
+        new_meta_buf = NULL;
+        return r;
+    }
+    return 0;
+}
+
+int disk_tool_t::resize_write_new_meta()
+{
+    new_meta_fd = open(new_meta_device.c_str(), O_DIRECT|O_RDWR);
+    if (new_meta_fd < 0)
+    {
+        fprintf(stderr, "Failed to open new metadata device %s: %s\n", new_meta_device.c_str(), strerror(errno));
+        return 1;
+    }
+    lseek64(new_meta_fd, new_meta_offset, 0);
+    write_blocking(new_meta_fd, new_meta_buf, new_meta_len);
+    fsync(new_meta_fd);
+    close(new_meta_fd);
+    new_meta_fd = -1;
+    free(new_meta_buf);
+    new_meta_buf = NULL;
+    return 0;
+}
--- a/src/disk_tool_udev.cpp
+++ b/src/disk_tool_udev.cpp
@@ -0,0 +1,364 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#include <sys/file.h>
+
+#include "disk_tool.h"
+#include "rw_blocking.h"
+
+struct __attribute__((__packed__)) vitastor_disk_superblock_t
+{
+    uint64_t magic;
+    uint32_t crc32c;
+    uint32_t size;
+    uint8_t json_data[];
+};
+
+static std::string udev_escape(std::string str)
+{
+    std::string r;
+    int p = str.find_first_of("\"\' \t\r\n"), prev = 0;
+    if (p == std::string::npos)
+    {
+        return str;
+    }
+    while (p != std::string::npos)
+    {
+        r += str.substr(prev, p-prev);
+        r += "\\";
+        prev = p;
+        p = str.find_first_of("\"\' \t\r\n", p+1);
+    }
+    r += str.substr(prev);
+    return r;
+}
+
+int disk_tool_t::udev_import(std::string device)
+{
+    json11::Json sb = read_osd_superblock(device);
+    if (sb.is_null())
+    {
+        return 1;
+    }
+    uint64_t osd_num = sb["params"]["osd_num"].uint64_value();
+    // Print variables for udev
+    printf("VITASTOR_OSD_NUM=%lu\n", osd_num);
+    printf("VITASTOR_ALIAS=osd%lu-%s\n", osd_num, sb["device_type"].string_value().c_str());
+    printf("VITASTOR_DATA_DEVICE=%s\n", udev_escape(sb["params"]["data_device"].string_value()).c_str());
+    if (sb["real_meta_device"].string_value() != "" && sb["real_meta_device"] != sb["real_data_device"])
+        printf("VITASTOR_META_DEVICE=%s\n", udev_escape(sb["params"]["meta_device"].string_value()).c_str());
+    if (sb["real_journal_device"].string_value() != "" && sb["real_journal_device"] != sb["real_meta_device"])
+        printf("VITASTOR_JOURNAL_DEVICE=%s\n", udev_escape(sb["params"]["journal_device"].string_value()).c_str());
+    return 0;
+}
+
+int disk_tool_t::read_sb(std::string device)
+{
+    json11::Json sb = read_osd_superblock(device);
+    if (sb.is_null())
+    {
+        return 1;
+    }
+    printf("%s\n", sb["params"].dump().c_str());
+    return 0;
+}
+
+int disk_tool_t::write_sb(std::string device)
+{
+    std::string input;
+    int r;
+    char buf[4096];
+    while (1)
+    {
+        r = read(0, buf, sizeof(buf));
+        if (r <= 0 && errno != EAGAIN)
+            break;
+        input += std::string(buf, r);
+    }
+    std::string json_err;
+    json11::Json params = json11::Json::parse(input, json_err);
+    if (json_err != "" || !params["osd_num"].uint64_value() || params["data_device"].string_value() == "")
+    {
+        fprintf(stderr, "Invalid JSON input\n");
+        return 1;
+    }
+    return !write_osd_superblock(device, params);
+}
+
+uint32_t disk_tool_t::write_osd_superblock(std::string device, json11::Json params)
+{
+    std::string json_data = params.dump();
+    uint32_t sb_size = sizeof(vitastor_disk_superblock_t)+json_data.size();
+    if (sb_size > VITASTOR_DISK_MAX_SB_SIZE)
+    {
+        fprintf(stderr, "JSON data for superblock is too large\n");
+        return 0;
+    }
+    uint64_t buf_len = ((sb_size+4095)/4096) * 4096;
+    uint8_t *buf = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, buf_len);
+    memset(buf, 0, buf_len);
+    vitastor_disk_superblock_t *sb = (vitastor_disk_superblock_t*)buf;
+    sb->magic = VITASTOR_DISK_MAGIC;
+    sb->size = sb_size;
+    memcpy(sb->json_data, json_data.c_str(), json_data.size());
+    sb->crc32c = crc32c(0, &sb->size, sb->size - ((uint8_t*)&sb->size - buf));
+    int fd = open(device.c_str(), O_DIRECT|O_RDWR);
+    if (fd < 0)
+    {
+        fprintf(stderr, "Failed to open device %s: %s\n", device.c_str(), strerror(errno));
+        free(buf);
+        return 0;
+    }
+    int r = write_blocking(fd, buf, buf_len);
+    if (r < 0)
+    {
+        fprintf(stderr, "Failed to write to %s: %s\n", device.c_str(), strerror(errno));
+        close(fd);
+        free(buf);
+        return 0;
+    }
+    close(fd);
+    free(buf);
+    shell_exec({ "udevadm", "trigger", "--settle", device }, "", NULL, NULL);
+    return sb_size;
+}
+
+json11::Json disk_tool_t::read_osd_superblock(std::string device, bool expect_exist)
+{
+    vitastor_disk_superblock_t *sb = NULL;
+    uint8_t *buf = NULL;
+    json11::Json osd_params;
+    std::string json_err;
+    std::string real_device, device_type, real_data, real_meta, real_journal;
+    int r, fd = open(device.c_str(), O_DIRECT|O_RDWR);
+    if (fd < 0)
+    {
+        fprintf(stderr, "Failed to open device %s: %s\n", device.c_str(), strerror(errno));
+        return osd_params;
+    }
+    buf = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, 4096);
+    r = read_blocking(fd, buf, 4096);
+    if (r != 4096)
+    {
+        fprintf(stderr, "Failed to read OSD superblock from %s: %s\n", device.c_str(), strerror(errno));
+        goto ex;
+    }
+    sb = (vitastor_disk_superblock_t*)buf;
+    if (sb->magic != VITASTOR_DISK_MAGIC)
+    {
+        if (expect_exist)
+            fprintf(stderr, "Invalid OSD superblock on %s: magic number mismatch\n", device.c_str());
+        goto ex;
+    }
+    if (sb->size > VITASTOR_DISK_MAX_SB_SIZE ||
+        // +2 is minimal json: {}
+        sb->size < sizeof(vitastor_disk_superblock_t)+2)
+    {
+        if (expect_exist)
+            fprintf(stderr, "Invalid OSD superblock on %s: invalid size\n", device.c_str());
+        goto ex;
+    }
+    if (sb->size > 4096)
+    {
+        uint64_t sb_size = ((sb->size+4095)/4096)*4096;
+        free(buf);
+        buf = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, sb_size);
+        lseek64(fd, 0, 0);
+        r = read_blocking(fd, buf, sb_size);
+        if (r != sb_size)
+        {
+            fprintf(stderr, "Failed to read OSD superblock from %s: %s\n", device.c_str(), strerror(errno));
+            goto ex;
+        }
+        sb = (vitastor_disk_superblock_t*)buf;
+    }
+    if (sb->crc32c != crc32c(0, &sb->size, sb->size - ((uint8_t*)&sb->size - buf)))
+    {
+        if (expect_exist)
+            fprintf(stderr, "Invalid OSD superblock on %s: crc32 mismatch\n", device.c_str());
+        goto ex;
+    }
+    osd_params = json11::Json::parse(std::string((char*)sb->json_data, sb->size - sizeof(vitastor_disk_superblock_t)), json_err);
+    if (json_err != "")
+    {
+        if (expect_exist)
+            fprintf(stderr, "Invalid OSD superblock on %s: invalid JSON\n", device.c_str());
+        goto ex;
+    }
+    // Validate superblock
+    if (!osd_params["osd_num"].uint64_value())
+    {
+        if (expect_exist)
+            fprintf(stderr, "OSD superblock on %s lacks osd_num\n", device.c_str());
+        osd_params = json11::Json();
+        goto ex;
+    }
+    if (osd_params["data_device"].string_value() == "")
+    {
+        if (expect_exist)
+            fprintf(stderr, "OSD superblock on %s lacks data_device\n", device.c_str());
+        osd_params = json11::Json();
+        goto ex;
+    }
+    real_device = realpath_str(device);
+    real_data = realpath_str(osd_params["data_device"].string_value());
+    real_meta = osd_params["meta_device"].string_value() != "" && osd_params["meta_device"] != osd_params["data_device"]
+        ? realpath_str(osd_params["meta_device"].string_value()) : "";
+    real_journal = osd_params["journal_device"].string_value() != "" && osd_params["journal_device"] != osd_params["meta_device"]
+        ? realpath_str(osd_params["journal_device"].string_value()) : "";
+    if (real_journal == real_meta)
+    {
+        real_journal = "";
+    }
+    if (real_meta == real_data)
+    {
+        real_meta = "";
+    }
+    if (real_device == real_data)
+    {
+        device_type = "data";
+    }
+    else if (real_device == real_meta)
+    {
+        device_type = "meta";
+    }
+    else if (real_device == real_journal)
+    {
+        device_type = "journal";
+    }
+    else
+    {
+        if (expect_exist)
+            fprintf(stderr, "Invalid OSD superblock on %s: does not refer to the device itself\n", device.c_str());
+        osd_params = json11::Json();
+        goto ex;
+    }
+    osd_params = json11::Json::object{
+        { "params", osd_params },
+        { "device_type", device_type },
+        { "real_data_device", real_data },
+        { "real_meta_device", real_meta },
+        { "real_journal_device", real_journal },
+    };
+ex:
+    free(buf);
+    close(fd);
+    return osd_params;
+}
+
+int disk_tool_t::systemd_start_stop_osds(std::vector<std::string> cmd, std::vector<std::string> devices)
+{
+    if (!devices.size())
+    {
+        fprintf(stderr, "Device path is missing\n");
+        return 1;
+    }
+    std::vector<std::string> svcs;
+    for (auto & device: devices)
+    {
+        json11::Json sb = read_osd_superblock(device);
+        if (!sb.is_null())
+        {
+            svcs.push_back("vitastor-osd@"+sb["params"]["osd_num"].as_string());
+        }
+    }
+    if (!svcs.size())
+    {
+        return 1;
+    }
+    std::vector<char*> argv;
+    argv.push_back((char*)"systemctl");
+    for (auto & s: cmd)
+    {
+        argv.push_back((char*)s.c_str());
+    }
+    for (auto & s: svcs)
+    {
+        argv.push_back((char*)s.c_str());
+    }
+    argv.push_back(NULL);
+    execvpe("systemctl", argv.data(), environ);
+    return 0;
+}
+
+int disk_tool_t::exec_osd(std::string device)
+{
+    json11::Json sb = read_osd_superblock(device);
+    if (sb.is_null())
+    {
+        return 1;
+    }
+    std::string osd_binary = "vitastor-osd";
+    if (options["osd-binary"] != "")
+    {
+        osd_binary = options["osd-binary"];
+    }
+    std::vector<std::string> argstr;
+    argstr.push_back(osd_binary.c_str());
+    for (auto & kv: sb["params"].object_items())
+    {
+        argstr.push_back("--"+kv.first);
+        argstr.push_back(kv.second.is_string() ? kv.second.string_value() : kv.second.dump());
+    }
+    char *argv[argstr.size()+1];
+    for (int i = 0; i < argstr.size(); i++)
+    {
+        argv[i] = (char*)argstr[i].c_str();
+    }
+    argv[argstr.size()] = NULL;
+    execvpe(osd_binary.c_str(), argv, environ);
+    return 0;
+}
+
+static int check_disabled_cache(std::string dev)
+{
+    int r = disable_cache(dev);
+    if (r == 1)
+    {
+        fprintf(
+            stderr, "Warning: fsync is disabled for %s, but cache status check failed."
+            " Ensure that cache is in write-through mode yourself or you may lose data.\n", dev.c_str()
+        );
+    }
+    else if (r == -1)
+    {
+        fprintf(
+            stderr, "Error: fsync is disabled for %s, but its cache is in write-back mode"
+            " and we failed to make it write-through. Data loss is presumably possible."
+            " Either switch the cache to write-through mode yourself or disable the check"
+            " using skip_cache_check=1 in the superblock.\n", dev.c_str()
+        );
+        return 1;
+    }
+    return 0;
+}
+
+int disk_tool_t::pre_exec_osd(std::string device)
+{
+    json11::Json sb = read_osd_superblock(device);
+    if (sb.is_null())
+    {
+        return 1;
+    }
+    if (!sb["params"]["skip_cache_check"].uint64_value())
+    {
+        if (json_is_true(sb["params"]["disable_data_fsync"]) &&
+            check_disabled_cache(sb["real_data_device"].string_value()) != 0)
+        {
+            return 1;
+        }
+        if (json_is_true(sb["params"]["disable_meta_fsync"]) &&
+            sb["real_meta_device"].string_value() != "" && sb["real_meta_device"] != sb["real_data_device"] &&
+            check_disabled_cache(sb["real_meta_device"].string_value()) != 0)
+        {
+            return 1;
+        }
+        if (json_is_true(sb["params"]["disable_journal_fsync"]) &&
+            sb["real_journal_device"].string_value() != "" && sb["real_journal_device"] != sb["real_meta_device"] &&
+            check_disabled_cache(sb["real_journal_device"].string_value()) != 0)
+        {
+            return 1;
+        }
+    }
+    return 0;
+}
--- a/src/disk_tool_upgrade.cpp
+++ b/src/disk_tool_upgrade.cpp
@@ -0,0 +1,178 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#include <regex>
+#include "disk_tool.h"
+#include "str_util.h"
+
+static std::map<std::string, std::string> read_vitastor_unit(std::string unit)
+{
+    std::smatch m;
+    if (unit == "" || !std::regex_match(unit, m, std::regex(".*/vitastor-osd\\d+\\.service")))
+    {
+        fprintf(stderr, "unit file name does not match <path>/vitastor-osd<NUMBER>.service\n");
+        return {};
+    }
+    std::string text = read_file(unit);
+    if (!std::regex_search(text, m, std::regex("\nExecStart\\s*=[^\n]+vitastor-osd\\s*(([^\\\\\n&>\\d]+|\\\\[ \t\r]*\n|\\d[^>])+)")))
+    {
+        fprintf(stderr, "Failed to extract ExecStart command from %s\n", unit.c_str());
+        return {};
+    }
+    std::string cmd = trim(m[1]);
+    cmd = str_replace(cmd, "\\\n", " ");
+    std::string key;
+    std::map<std::string, std::string> r;
+    auto ns = std::regex("\\S+");
+    for (auto it = std::sregex_token_iterator(cmd.begin(), cmd.end(), ns, 0), end = std::sregex_token_iterator();
+        it != end; it++)
+    {
+        if (key == "" && ((std::string)(*it)).substr(0, 2) == "--")
+            key = ((std::string)(*it)).substr(2);
+        else if (key != "")
+        {
+            r[key] = *it;
+            key = "";
+        }
+    }
+    return r;
+}
+
+static int fix_partition_type(std::string dev_by_uuid)
+{
+    auto uuid = strtolower(dev_by_uuid.substr(dev_by_uuid.rfind('/')+1));
+    std::string parent_dev = get_parent_device(realpath_str(dev_by_uuid, false));
+    if (parent_dev == "")
+        return 1;
+    auto pt = read_parttable("/dev/"+parent_dev);
+    if (pt.is_null())
+        return 1;
+    std::string script = "label: gpt\n\n";
+    for (const auto & part: pt["partitions"].array_items())
+    {
+        bool this_part = (strtolower(part["uuid"].string_value()) == uuid);
+        if (this_part && strtolower(part["type"].string_value()) == "e7009fac-a5a1-4d72-af72-53de13059903")
+        {
+            // Already correct type
+            return 0;
+        }
+        script += part["node"].string_value()+": ";
+        bool first = true;
+        for (const auto & kv: part.object_items())
+        {
+            if (kv.first != "node")
+            {
+                script += (first ? "" : ", ")+kv.first+"="+
+                    (kv.first == "type" && this_part
+                        ? "e7009fac-a5a1-4d72-af72-53de13059903"
+                        : (kv.second.is_string() ? kv.second.string_value() : kv.second.dump()));
+                first = false;
+            }
+        }
+        script += "\n";
+    }
+    return shell_exec({ "sfdisk", "--no-reread", "--force", "/dev/"+parent_dev }, script, NULL, NULL);
+}
+
+int disk_tool_t::upgrade_simple_unit(std::string unit)
+{
+    if (stoull_full(unit) != 0)
+    {
+        // OSD number
+        unit = "/etc/systemd/system/vitastor-osd"+unit+".service";
+    }
+    auto options = read_vitastor_unit(unit);
+    if (!options.size())
+        return 1;
+    if (!stoull_full(options["osd_num"], 10) || options["data_device"] == "")
+    {
+        fprintf(stderr, "osd_num or data_device are missing in %s\n", unit.c_str());
+        return 1;
+    }
+    if (options["data_device"].substr(0, 22) != "/dev/disk/by-partuuid/" ||
+        options["meta_device"] != "" && options["meta_device"].substr(0, 22) != "/dev/disk/by-partuuid/" ||
+        options["journal_device"] != "" && options["journal_device"].substr(0, 22) != "/dev/disk/by-partuuid/")
+    {
+        fprintf(
+            stderr, "data_device, meta_device and journal_device must begin with"
+            " /dev/disk/by-partuuid/ i.e. they must be GPT partitions identified by UUIDs"
+        );
+        return 1;
+    }
+    // Stop and disable the service
+    auto service_name = unit.substr(unit.rfind('/') + 1);
+    if (shell_exec({ "systemctl", "disable", "--now", service_name }, "", NULL, NULL) != 0)
+    {
+        return 1;
+    }
+    uint64_t j_o = parse_size(options["journal_offset"]);
+    uint64_t m_o = parse_size(options["meta_offset"]);
+    uint64_t d_o = parse_size(options["data_offset"]);
+    bool m_is_d = options["meta_device"] == "" || options["meta_device"] == options["data_device"];
+    bool j_is_m = options["journal_device"] == "" || options["journal_device"] == options["meta_device"];
+    bool j_is_d = j_is_m && m_is_d || options["journal_device"] == options["data_device"];
+    if (d_o < 4096 || j_o < 4096 || m_o < 4096)
+    {
+        // Resize data
+        uint64_t blk = stoull_full(options["block_size"]);
+        blk = blk ? blk : 128*1024;
+        std::map<std::string, uint64_t> resize;
+        if (d_o < 4096 || m_is_d && m_o < 4096 && m_o < d_o || j_is_d && j_o < 4096 && j_o < d_o)
+        {
+            resize["new_data_offset"] = d_o+blk;
+            if (m_is_d && m_o < d_o)
+                resize["new_meta_offset"] = m_o+blk;
+            if (j_is_d && j_o < d_o)
+                resize["new_journal_offset"] = j_o+blk;
+        }
+        if (!m_is_d && m_o < 4096)
+        {
+            resize["new_meta_offset"] = m_o+4096;
+            if (j_is_m && m_o < j_o)
+                resize["new_journal_offset"] = j_o+4096;
+        }
+        if (!j_is_d && !j_is_m && j_o < 4096)
+            resize["new_journal_offset"] = j_o+4096;
+        disk_tool_t resizer;
+        resizer.options = options;
+        for (auto & kv: resize)
+            resizer.options[kv.first] = std::to_string(kv.second);
+        if (resizer.resize_data() != 0)
+        {
+            // FIXME: Resize with backup or journal
+            fprintf(
+                stderr, "Failed to resize data to make space for the superblock\n"
+                "Sorry, but your OSD may now be corrupted depending on what went wrong during resize :-(\n"
+                "Please review the messages above and take action accordingly\n"
+            );
+            return 1;
+        }
+        for (auto & kv: resize)
+            options[kv.first.substr(4)] = std::to_string(kv.second);
+    }
+    // Write superblocks
+    if (!write_osd_superblock(options["data_device"], options) ||
+        (!m_is_d && !write_osd_superblock(options["meta_device"], options)) ||
+        (!j_is_m && !j_is_d && !write_osd_superblock(options["journal_device"], options)))
+    {
+        return 1;
+    }
+    // Change partition types
+    if (fix_partition_type(options["data_device"]) != 0 ||
+        (!m_is_d && fix_partition_type(options["meta_device"]) != 0) ||
+        (!j_is_m && !j_is_d && fix_partition_type(options["journal_device"]) != 0))
+    {
+        return 1;
+    }
+    // Enable the new unit
+    if (shell_exec({ "systemctl", "enable", "--now", "vitastor-osd@"+options["osd_num"] }, "", NULL, NULL) != 0)
+    {
+        fprintf(stderr, "Failed to enable systemd unit vitastor-osd@%s\n", options["osd_num"].c_str());
+        return 1;
+    }
+    fprintf(
+        stderr, "\nOK: Converted OSD %s to the new scheme. The new service name is vitastor-osd@%s\n",
+        options["osd_num"].c_str(), options["osd_num"].c_str()
+    );
+    return 0;
+}
--- a/src/disk_tool_utils.cpp
+++ b/src/disk_tool_utils.cpp
@@ -0,0 +1,356 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 (see README.md for details)
+
+#include <sys/wait.h>
+#include <dirent.h>
+
+#include "disk_tool.h"
+#include "rw_blocking.h"
+#include "str_util.h"
+
+uint64_t sscanf_json(const char *fmt, const json11::Json & str)
+{
+    uint64_t value = 0;
+    if (fmt)
+        sscanf(str.string_value().c_str(), "%lx", &value);
+    else if (str.string_value().size() > 2 && (str.string_value()[0] == '0' && str.string_value()[1] == 'x'))
+        sscanf(str.string_value().c_str(), "0x%lx", &value);
+    else
+        value = str.uint64_value();
+    return value;
+}
+
+static int fromhex(char c)
+{
+    if (c >= '0' && c <= '9')
+        return (c-'0');
+    else if (c >= 'a' && c <= 'f')
+        return (c-'a'+10);
+    else if (c >= 'A' && c <= 'F')
+        return (c-'A'+10);
+    return -1;
+}
+
+void fromhexstr(const std::string & from, int bytes, uint8_t *to)
+{
+    for (int i = 0; i < from.size() && i < bytes; i++)
+    {
+        int x = fromhex(from[2*i]), y = fromhex(from[2*i+1]);
+        if (x < 0 || y < 0)
+            break;
+        to[i] = x*16 + y;
+    }
+}
+
+std::string realpath_str(std::string path, bool nofail)
+{
+    char *p = realpath((char*)path.c_str(), NULL);
+    if (!p)
+    {
+        fprintf(stderr, "Failed to resolve %s: %s\n", path.c_str(), strerror(errno));
+        return nofail ? path : "";
+    }
+    std::string rp(p);
+    free(p);
+    return rp;
+}
+
+std::string read_all_fd(int fd)
+{
+    int res_size = 0;
+    std::string res;
+    while (1)
+    {
+        res.resize(res_size+1024);
+        int r = read(fd, (char*)res.data()+res_size, res.size()-res_size);
+        if (r > 0)
+            res_size += r;
+        else if (!r || errno != EAGAIN && errno != EINTR)
+            break;
+    }
+    res.resize(res_size);
+    return res;
+}
+
+std::string read_file(std::string file, bool allow_enoent)
+{
+    std::string res;
+    int fd = open(file.c_str(), O_RDONLY);
+    if (fd < 0 || (res = read_all_fd(fd)) == "")
+    {
+        int err = errno;
+        if (fd >= 0)
+            close(fd);
+        if (!allow_enoent || err != ENOENT)
+            fprintf(stderr, "Can't read %s: %s\n", file.c_str(), strerror(err));
+        return "";
+    }
+    close(fd);
+    return res;
+}
+
+// returns 1 = check error, 0 = write through, -1 = write back
+// (similar to 1 = warning, -1 = error, 0 = success in disable_cache)
+static int check_queue_cache(std::string dev, std::string parent_dev)
+{
+    auto r = read_file("/sys/block/"+dev+"/queue/write_cache", true);
+    if (r == "")
+        r = read_file("/sys/block/"+parent_dev+"/queue/write_cache");
+    if (r == "")
+        return 1;
+    return trim(r) == "write through" ? 0 : -1;
+}
+
+// returns 1 = warning, -1 = error, 0 = success
+int disable_cache(std::string dev)
+{
+    auto parent_dev = get_parent_device(dev);
+    if (parent_dev == "")
+        return 1;
+    auto scsi_disk = "/sys/block/"+parent_dev+"/device/scsi_disk";
+    DIR *dir = opendir(scsi_disk.c_str());
+    if (!dir)
+    {
+        if (errno == ENOENT)
+        {
+            // Not a SCSI/SATA device, just check /sys/block/.../queue/write_cache
+            return check_queue_cache(dev.substr(5), parent_dev);
+        }
+        else
+        {
+            fprintf(stderr, "Can't read directory %s: %s\n", scsi_disk.c_str(), strerror(errno));
+            return 1;
+        }
+    }
+    else
+    {
+        dirent *de = readdir(dir);
+        while (de && de->d_name[0] == '.' && (de->d_name[1] == 0 || de->d_name[1] == '.' && de->d_name[2] == 0))
+            de = readdir(dir);
+        if (!de)
+        {
+            // Not a SCSI/SATA device, just check /sys/block/.../queue/write_cache
+            closedir(dir);
+            return check_queue_cache(dev.substr(5), parent_dev);
+        }
+        scsi_disk += "/";
+        scsi_disk += de->d_name;
+        if (readdir(dir) != NULL)
+        {
+            // Error, multiple scsi_disk/* entries
+            closedir(dir);
+            fprintf(stderr, "Multiple entries in %s found\n", scsi_disk.c_str());
+            return 1;
+        }
+        closedir(dir);
+        // Check cache_type
+        scsi_disk += "/cache_type";
+        std::string cache_type = read_file(scsi_disk);
+        if (cache_type == "")
+            return 1;
+        if (cache_type == "write back")
+        {
+            int fd = open(scsi_disk.c_str(), O_WRONLY);
+            if (fd < 0 || write_blocking(fd, (void*)"write through", strlen("write through")) != strlen("write through"))
+            {
+                if (fd >= 0)
+                    close(fd);
+                fprintf(stderr, "Can't write to %s: %s\n", scsi_disk.c_str(), strerror(errno));
+                return -1;
+            }
+            close(fd);
+        }
+    }
+    return 0;
+}
+
+std::string get_parent_device(std::string dev)
+{
+    if (dev.substr(0, 5) != "/dev/")
+    {
+        fprintf(stderr, "%s is outside /dev/\n", dev.c_str());
+        return "";
+    }
+    dev = dev.substr(5);
+    int i = dev.size();
+    while (i > 0 && isdigit(dev[i-1]))
+        i--;
+    if (i >= 1 && dev[i-1] == '-') // dm-0, dm-1
+        return dev;
+    else if (i >= 2 && dev[i-1] == 'p' && isdigit(dev[i-2])) // nvme0n1p1
+        i--;
+    // Check that such block device exists
+    struct stat st;
+    auto chk = "/sys/block/"+dev.substr(0, i);
+    if (stat(chk.c_str(), &st) < 0)
+    {
+        if (errno != ENOENT)
+        {
+            fprintf(stderr, "Failed to stat %s: %s\n", chk.c_str(), strerror(errno));
+            return "";
+        }
+        return dev;
+    }
+    return dev.substr(0, i);
+}
+
+bool json_is_true(const json11::Json & val)
+{
+    if (val.is_string())
+        return val == "true" || val == "yes" || val == "1";
+    return val.bool_value();
+}
+
+int shell_exec(const std::vector<std::string> & cmd, const std::string & in, std::string *out, std::string *err)
+{
+    int child_stdin[2], child_stdout[2], child_stderr[2];
+    pid_t pid;
+    if (pipe(child_stdin) == -1)
+        goto err_pipe1;
+    if (pipe(child_stdout) == -1)
+        goto err_pipe2;
+    if (pipe(child_stderr) == -1)
+        goto err_pipe3;
+    if ((pid = fork()) == -1)
+        goto err_fork;
+    if (pid)
+    {
+        // Parent
+        // We should do select() to do something serious, but this is for simple cases
+        close(child_stdin[0]);
+        close(child_stdout[1]);
+        close(child_stderr[1]);
+        write_blocking(child_stdin[1], (void*)in.data(), in.size());
+        close(child_stdin[1]);
+        std::string s;
+        s = read_all_fd(child_stdout[0]);
+        if (out)
+            out->swap(s);
+        close(child_stdout[0]);
+        s = read_all_fd(child_stderr[0]);
+        if (err)
+            err->swap(s);
+        close(child_stderr[0]);
+        int wstatus = 0;
+        waitpid(pid, &wstatus, 0);
+        return WEXITSTATUS(wstatus);
+    }
+    else
+    {
+        // Child
+        dup2(child_stdin[0], 0);
+        dup2(child_stdout[1], 1);
+        if (err)
+            dup2(child_stderr[1], 2);
+        close(child_stdin[0]);
+        close(child_stdin[1]);
+        close(child_stdout[0]);
+        close(child_stdout[1]);
+        close(child_stderr[0]);
+        close(child_stderr[1]);
+        char *argv[cmd.size()+1];
+        for (int i = 0; i < cmd.size(); i++)
+        {
+            argv[i] = (char*)cmd[i].c_str();
+        }
+        argv[cmd.size()] = NULL;
+        execvp(argv[0], argv);
+        std::string full_cmd;
+        for (int i = 0; i < cmd.size(); i++)
+        {
+            full_cmd += cmd[i];
+            full_cmd += " ";
+        }
+        full_cmd.resize(full_cmd.size() > 0 ? full_cmd.size()-1 : 0);
+        fprintf(stderr, "error running %s: %s", full_cmd.c_str(), strerror(errno));
+        exit(255);
+    }
+err_fork:
+    close(child_stderr[1]);
+    close(child_stderr[0]);
+err_pipe3:
+    close(child_stdout[1]);
+    close(child_stdout[0]);
+err_pipe2:
+    close(child_stdin[1]);
+    close(child_stdin[0]);
+err_pipe1:
+    return 255;
+}
+
+int write_zero(int fd, uint64_t offset, uint64_t size)
+{
+    uint64_t buf_len = 1024*1024;
+    void *zero_buf = memalign_or_die(MEM_ALIGNMENT, buf_len);
+    ssize_t r;
+    while (size > 0)
+    {
+        r = pwrite(fd, zero_buf, size > buf_len ? buf_len : size, offset);
+        if (r > 0)
+        {
+            size -= r;
+            offset += r;
+        }
+        else if (errno != EAGAIN && errno != EINTR)
+        {
+            free(zero_buf);
+            return -1;
+        }
+    }
+    free(zero_buf);
+    return 0;
+}
+
+// Returns false in case of an error
+// Returns null if there is no partition table
+json11::Json read_parttable(std::string dev)
+{
+    std::string part_dump;
+    int r = shell_exec({ "sfdisk", "--dump", dev, "--json" }, "", &part_dump, NULL);
+    if (r == 255)
+    {
+        fprintf(stderr, "Error running sfdisk --dump %s --json\n", dev.c_str());
+        return json11::Json(false);
+    }
+    // Decode partition table
+    json11::Json pt;
+    if (part_dump != "")
+    {
+        std::string err;
+        pt = json11::Json::parse(part_dump, err);
+        if (err != "")
+        {
+            fprintf(stderr, "sfdisk --dump %s --json returned bad JSON: %s\n", dev.c_str(), part_dump.c_str());
+            return json11::Json(false);
+        }
+        pt = pt["partitiontable"];
+        if (pt.is_object() && pt["label"].string_value() != "gpt")
+        {
+            fprintf(stderr, "%s contains \"%s\" partition table, only GPT is supported, skipping\n", dev.c_str(), pt["label"].string_value().c_str());
+            return json11::Json(false);
+        }
+    }
+    return pt;
+}
+
+uint64_t dev_size_from_parttable(json11::Json pt)
+{
+    uint64_t free = pt["lastlba"].uint64_value() + 1 - pt["firstlba"].uint64_value();
+    if (!pt["sectorsize"].uint64_value())
+        free *= 512;
+    else
+        free *= pt["sectorsize"].uint64_value();
+    return free;
+}
+
+uint64_t free_from_parttable(json11::Json pt)
+{
+    uint64_t free = pt["lastlba"].uint64_value() + 1 - pt["firstlba"].uint64_value();
+    for (const auto & part: pt["partitions"].array_items())
+        free -= part["size"].uint64_value();
+    if (!pt["sectorsize"].uint64_value())
+        free *= 512;
+    else
+        free *= pt["sectorsize"].uint64_value();
+    return free;
+}
--- a/src/dump_journal.cpp
+++ b/src/dump_journal.cpp
@@ -1,224 +0,0 @@
-// Copyright (c) Vitaliy Filippov, 2019+
-// License: VNPL-1.1 (see README.md for details)
-
-#define _LARGEFILE64_SOURCE
-#include <sys/types.h>
-#include <sys/ioctl.h>
-#include <sys/stat.h>
-#include <sys/time.h>
-#include <fcntl.h>
-#include <unistd.h>
-#include <stdint.h>
-#include <malloc.h>
-#include <linux/fs.h>
-#include <string.h>
-#include <errno.h>
-#include <assert.h>
-#include <stdio.h>
-
-#include "blockstore_impl.h"
-#include "crc32c.h"
-
-struct journal_dump_t
-{
-    char *journal_device;
-    uint32_t journal_block;
-    uint64_t journal_offset;
-    uint64_t journal_len;
-    uint64_t journal_pos;
-    bool all;
-    bool started;
-    int fd;
-    uint32_t crc32_last;
-
-    int dump_block(void *buf);
-};
-
-int main(int argc, char *argv[])
-{
-    journal_dump_t self = { 0 };
-    int b = 1;
-    if (argc >= 2 && !strcmp(argv[1], "--all"))
-    {
-        self.all = true;
-        b = 2;
-    }
-    if (argc < b+4)
-    {
-        printf("USAGE: %s [--all] <journal_file> <journal_block_size> <offset> <size>\n", argv[0]);
-        return 1;
-    }
-    self.journal_device = argv[b];
-    self.journal_block = strtoul(argv[b+1], NULL, 10);
-    self.journal_offset = strtoull(argv[b+2], NULL, 10);
-    self.journal_len = strtoull(argv[b+3], NULL, 10);
-    if (self.journal_block < MEM_ALIGNMENT || (self.journal_block % MEM_ALIGNMENT) ||
-        self.journal_block > 128*1024)
-    {
-        printf("Invalid journal block size\n");
-        return 1;
-    }
-    self.fd = open(self.journal_device, O_DIRECT|O_RDONLY);
-    if (self.fd == -1)
-    {
-        printf("Failed to open journal\n");
-        return 1;
-    }
-    void *data = memalign(MEM_ALIGNMENT, self.journal_block);
-    self.journal_pos = 0;
-    if (self.all)
-    {
-        while (self.journal_pos < self.journal_len)
-        {
-            int r = pread(self.fd, data, self.journal_block, self.journal_offset+self.journal_pos);
-            assert(r == self.journal_block);
-            uint64_t s;
-            for (s = 0; s < self.journal_block; s += 8)
-            {
-                if (*((uint64_t*)((uint8_t*)data+s)) != 0)
-                    break;
-            }
-            if (s == self.journal_block)
-            {
-                printf("offset %08lx: zeroes\n", self.journal_pos);
-                self.journal_pos += self.journal_block;
-            }
-            else if (((journal_entry*)data)->magic == JOURNAL_MAGIC)
-            {
-                printf("offset %08lx:\n", self.journal_pos);
-                self.dump_block(data);
-            }
-            else
-            {
-                printf("offset %08lx: no magic in the beginning, looks like random data (pattern=%lx)\n", self.journal_pos, *((uint64_t*)data));
-                self.journal_pos += self.journal_block;
-            }
-        }
-    }
-    else
-    {
-        int r = pread(self.fd, data, self.journal_block, self.journal_offset+self.journal_pos);
-        assert(r == self.journal_block);
-        journal_entry *je = (journal_entry*)(data);
-        if (je->magic != JOURNAL_MAGIC || je->type != JE_START || je_crc32(je) != je->crc32)
-        {
-            printf("offset %08lx: journal superblock is invalid\n", self.journal_pos);
-        }
-        else
-        {
-            printf("offset %08lx:\n", self.journal_pos);
-            self.dump_block(data);
-            self.started = false;
-            self.journal_pos = je->start.journal_start;
-            while (1)
-            {
-                if (self.journal_pos >= self.journal_len)
-                    self.journal_pos = self.journal_block;
-                r = pread(self.fd, data, self.journal_block, self.journal_offset+self.journal_pos);
-                assert(r == self.journal_block);
-                printf("offset %08lx:\n", self.journal_pos);
-                r = self.dump_block(data);
-                if (r <= 0)
-                {
-                    printf("end of the journal\n");
-                    break;
-                }
-            }
-        }
-    }
-    free(data);
-    close(self.fd);
-    return 0;
-}
-
-int journal_dump_t::dump_block(void *buf)
-{
-    uint32_t pos = 0;
-    journal_pos += journal_block;
-    int entry = 0;
-    bool wrapped = false;
-    while (pos < journal_block)
-    {
-        journal_entry *je = (journal_entry*)((uint8_t*)buf + pos);
-        if (je->magic != JOURNAL_MAGIC || je->type < JE_MIN || je->type > JE_MAX ||
-            !all && started && je->crc32_prev != crc32_last)
-        {
-            break;
-        }
-        bool crc32_valid = je_crc32(je) == je->crc32;
-        if (!all && !crc32_valid)
-        {
-            break;
-        }
-        started = true;
-        crc32_last = je->crc32;
-        printf("entry % 3d: crc32=%08x %s prev=%08x ", entry, je->crc32, (crc32_valid ? "(valid)" : "(invalid)"), je->crc32_prev);
-        if (je->type == JE_START)
-        {
-            printf("je_start start=%08lx\n", je->start.journal_start);
-        }
-        else if (je->type == JE_SMALL_WRITE || je->type == JE_SMALL_WRITE_INSTANT)
-        {
-            printf(
-                "je_small_write%s oid=%lx:%lx ver=%lu offset=%u len=%u loc=%08lx",
-                je->type == JE_SMALL_WRITE_INSTANT ? "_instant" : "",
-                je->small_write.oid.inode, je->small_write.oid.stripe,
-                je->small_write.version, je->small_write.offset, je->small_write.len,
-                je->small_write.data_offset
-            );
-            if (journal_pos + je->small_write.len > journal_len)
-            {
-                // data continues from the beginning of the journal
-                journal_pos = journal_block;
-                wrapped = true;
-            }
-            if (journal_pos != je->small_write.data_offset)
-            {
-                printf(" (mismatched, calculated = %lu)", journal_pos);
-            }
-            journal_pos += je->small_write.len;
-            if (journal_pos >= journal_len)
-            {
-                journal_pos = journal_block;
-                wrapped = true;
-            }
-            uint32_t data_crc32 = 0;
-            void *data = memalign(MEM_ALIGNMENT, je->small_write.len);
-            assert(pread(fd, data, je->small_write.len, journal_offset+je->small_write.data_offset) == je->small_write.len);
-            data_crc32 = crc32c(0, data, je->small_write.len);
-            free(data);
-            printf(
-                " data_crc32=%08x%s", je->small_write.crc32_data,
-                (data_crc32 != je->small_write.crc32_data) ? " (invalid)" : " (valid)"
-            );
-            printf("\n");
-        }
-        else if (je->type == JE_BIG_WRITE || je->type == JE_BIG_WRITE_INSTANT)
-        {
-            printf(
-                "je_big_write%s oid=%lx:%lx ver=%lu loc=%08lx\n",
-                je->type == JE_BIG_WRITE_INSTANT ? "_instant" : "",
-                je->big_write.oid.inode, je->big_write.oid.stripe, je->big_write.version, je->big_write.location
-            );
-        }
-        else if (je->type == JE_STABLE)
-        {
-            printf("je_stable oid=%lx:%lx ver=%lu\n", je->stable.oid.inode, je->stable.oid.stripe, je->stable.version);
-        }
-        else if (je->type == JE_ROLLBACK)
-        {
-            printf("je_rollback oid=%lx:%lx ver=%lu\n", je->rollback.oid.inode, je->rollback.oid.stripe, je->rollback.version);
-        }
-        else if (je->type == JE_DELETE)
-        {
-            printf("je_delete oid=%lx:%lx ver=%lu\n", je->del.oid.inode, je->del.oid.stripe, je->del.version);
-        }
-        pos += je->size;
-        entry++;
-    }
-    if (wrapped)
-    {
-        journal_pos = journal_len;
-    }
-    return entry;
-}
--- a/src/etcd_state_client.cpp
+++ b/src/etcd_state_client.cpp
@@ -7,7 +7,7 @@
 #ifndef __MOCK__
 #include "addr_util.h"
 #include "http_client.h"
-#include "base64.h"
+#include "str_util.h"
 #endif

 etcd_state_client_t::~etcd_state_client_t()
@@ -534,11 +534,18 @@ void etcd_state_client_t::load_global_config()
                global_config = kv.value.object_items();
            }
        }
-        bs_block_size = global_config["block_size"].uint64_value();
-        if (!bs_block_size)
+        global_block_size = global_config["block_size"].uint64_value();
+        if (!global_block_size)
        {
-            bs_block_size = DEFAULT_BLOCK_SIZE;
+            global_block_size = DEFAULT_BLOCK_SIZE;
        }
+        global_bitmap_granularity = global_config["bitmap_granularity"].uint64_value();
+        if (!global_bitmap_granularity)
+        {
+            global_bitmap_granularity = DEFAULT_BITMAP_GRANULARITY;
+        }
+        global_immediate_commit = global_config["immediate_commit"].string_value() == "all"
+            ? IMMEDIATE_ALL : (global_config["immediate_commit"].string_value() == "small" ? IMMEDIATE_SMALL : IMMEDIATE_NONE);
        on_load_config_hook(global_config);
    });
 }
@@ -732,9 +739,35 @@ void etcd_state_client_t::parse_state(const etcd_kv_t & kv)
                fprintf(stderr, "Pool %u has invalid max_osd_combinations (must be at least 100), skipping pool\n", pool_id);
                continue;
            }
+            // Data Block Size
+            pc.data_block_size = pool_item.second["block_size"].uint64_value();
+            if (!pc.data_block_size)
+                pc.data_block_size = global_block_size;
+            if ((pc.data_block_size & (pc.data_block_size-1)) ||
+                pc.data_block_size < MIN_DATA_BLOCK_SIZE || pc.data_block_size > MAX_DATA_BLOCK_SIZE)
+            {
+                fprintf(stderr, "Pool %u has invalid block_size (must be a power of two between %u and %u), skipping pool\n",
+                    pool_id, MIN_DATA_BLOCK_SIZE, MAX_DATA_BLOCK_SIZE);
+                continue;
+            }
+            // Bitmap Granularity
+            pc.bitmap_granularity = pool_item.second["bitmap_granularity"].uint64_value();
+            if (!pc.bitmap_granularity)
+                pc.bitmap_granularity = global_bitmap_granularity;
+            if (!pc.bitmap_granularity || pc.data_block_size % pc.bitmap_granularity)
+            {
+                fprintf(stderr, "Pool %u has invalid bitmap_granularity (must divide block_size), skipping pool\n", pool_id);
+                continue;
+            }
+            // Immediate Commit Mode
+            pc.immediate_commit = pool_item.second["immediate_commit"].is_string()
+                ? (pool_item.second["immediate_commit"].string_value() == "all"
+                    ? IMMEDIATE_ALL : (pool_item.second["immediate_commit"].string_value() == "small"
+                        ? IMMEDIATE_SMALL : IMMEDIATE_NONE))
+                : global_immediate_commit;
            // PG Stripe Size
            pc.pg_stripe_size = pool_item.second["pg_stripe_size"].uint64_value();
-            uint64_t min_stripe_size = bs_block_size * (pc.scheme == POOL_SCHEME_REPLICATED ? 1 : (pc.pg_size-pc.parity_chunks));
+            uint64_t min_stripe_size = pc.data_block_size * (pc.scheme == POOL_SCHEME_REPLICATED ? 1 : (pc.pg_size-pc.parity_chunks));
            if (pc.pg_stripe_size < min_stripe_size)
                pc.pg_stripe_size = min_stripe_size;
            // Save
--- a/src/etcd_state_client.h
+++ b/src/etcd_state_client.h
@@ -13,6 +13,13 @@
 #define ETCD_OSD_STATE_WATCH_ID 4

 #define DEFAULT_BLOCK_SIZE 128*1024
+#define MIN_DATA_BLOCK_SIZE 4*1024
+#define MAX_DATA_BLOCK_SIZE 128*1024*1024
+#define DEFAULT_BITMAP_GRANULARITY 4096
+
+#define IMMEDIATE_NONE 0
+#define IMMEDIATE_SMALL 1
+#define IMMEDIATE_ALL 2

 struct etcd_kv_t
 {
@@ -41,6 +48,7 @@ struct pool_config_t
    std::string name;
    uint64_t scheme;
    uint64_t pg_size, pg_minsize, parity_chunks;
+    uint32_t data_block_size, bitmap_granularity, immediate_commit;
    uint64_t pg_count;
    uint64_t real_pg_count;
    std::string failure_domain;
@@ -83,7 +91,6 @@ protected:
    int ws_keepalive_timer = -1;
    int ws_alive = 0;
    bool rand_initialized = false;
-    uint64_t bs_block_size = DEFAULT_BLOCK_SIZE;
    void add_etcd_url(std::string);
    void pick_next_etcd();
 public:
@@ -92,6 +99,9 @@ public:
    int max_etcd_attempts = 5;
    int etcd_quick_timeout = 1000;
    int etcd_slow_timeout = 5000;
+    uint64_t global_block_size = DEFAULT_BLOCK_SIZE;
+    uint32_t global_bitmap_granularity = DEFAULT_BITMAP_GRANULARITY;
+    uint32_t global_immediate_commit = IMMEDIATE_NONE;

    std::string etcd_prefix;
    int log_level = 0;
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Vitaliy Filippov	8fdf30b21f	Release 0.8.1 - Remove an additional data copy operation when flushing journal (should slightly increase write performance) - Fix a bug where new writes in the inmemory_journal=false mode could overwrite the data currently read by a parallel read operation - Fix degraded parity writes for EC N+K when K>1 where the bug could also lead to an "assertion failed" error - Fix missing journal space check for "big" writes which could lead to "prefill_single_journal_entry(): assertion failed..." error in OSD - Fix possible "assertion failed: next->prev_wait >= 0" in client in rare cases - Fix missing "len" field in vitastor-disk write-journal big_writes - Fix possible crash of a full OSD (ENOSPC) - Fix CSI build scripts to include newest packages every time - Fix CSI endpoint in the liveness probe manifest	2022-11-20 11:44:09 +03:00
Vitaliy Filippov	238037ae31	Make journal trimmer wait until reads are completed when inmemory_journal is false Without this new writes may in theory overwrite journal data being read at that time	2022-11-20 01:49:21 +03:00
Vitaliy Filippov	09a8864686	Fix degraded parity writes for EC N+K when K>1 Fixes possible `calc_rmw_parity_ec(): Assertion `bufs[i][curbuf[i]].buf' failed` error	2022-11-20 00:50:13 +03:00
Vitaliy Filippov	6e6f6ecbb0	Add missing journal space check for big_writes Fixes possible `prefill_single_journal_entry(): Assertion `!journal.sector_info[journal.cur_sector].flush_count' failed` error	2022-11-20 00:50:13 +03:00
Vitaliy Filippov	9491f81419	Add missing documentation for OSD tags	2022-11-20 00:50:13 +03:00
Vitaliy Filippov	44c2b30167	Take newest packages every time when rebuilding CSI	2022-11-20 00:50:13 +03:00
Vitaliy Filippov	bf8a0581cd	Fix possible "assertion failed: next->prev_wait >= 0" in client	2022-11-20 00:50:13 +03:00
Vitaliy Filippov	5953942042	Add crc32c test utility	2022-11-20 00:50:13 +03:00
Vitaliy Filippov	a276a1f737	Do not copy journal data additional time when flushing	2022-11-20 00:50:13 +03:00
Vitaliy Filippov	cc24e5796e	Add a FIXME	2022-11-20 00:50:09 +03:00
Vitaliy Filippov	6e26732e6a	Fix skipped "len" field in vitastor-disk write-journal big_writes	2022-11-12 12:01:40 +03:00
Vitaliy Filippov	b4edc79449	Fix possible segfault on ENOSPC	2022-11-12 11:59:43 +03:00
Vitaliy Filippov	5f26887d32	Fix csi endpoint in liveness probe	2022-11-10 18:37:37 +03:00
Vitaliy Filippov	11ec9ad874	Release 0.8.0 - Implement automatic OSD activation via udev and simple on-disk superblock storage - Add a new `vitastor-disk` tool and merge all disk-related functionality there. Now it can prepare new OSD disks, upgrade plain old systemd units to the new scheme, resize OSD data area, manage OSD services by disk paths, manage superblocks, automatically check and disable disk cache, dump and write back journal and metadata. - Add a documentation section about `vitastor-disk` (read it if you want details!) - Install systemd services during package installation instead of the older method of manually creating them via separate shell scripts - Add a new `make-etcd` script that reuses /etc/vitastor/vitastor.conf to configure etcd - Allow to configure block_size, bitmap_granularity and immediate_commit per-pool - Fix "fatal error: tried to overwrite non-zero metadata entry" which was possible in some cases after unclean OSD shutdown (caused by old metadata entries not being zeroed)	2022-09-05 13:51:20 +03:00
Vitaliy Filippov	83bb6598dc	Fix fsync autodetection for the single-device mode	2022-09-05 13:51:20 +03:00
Vitaliy Filippov	150f369346	Hotfixes for vitastor-disk prepare: max_other, get device size, older sfdisk	2022-09-05 12:48:27 +03:00
Vitaliy Filippov	8d9a5fde15	Fix docs (block_size vs object_size)	2022-09-04 14:47:04 +03:00
Vitaliy Filippov	9ccc607ab9	Fix parse_size	2022-09-04 14:20:56 +03:00
Vitaliy Filippov	8972878c77	Fix make-etcd for ip:port	2022-09-04 14:11:59 +03:00
Vitaliy Filippov	2a1da88253	Create /etc/vitastor during package installation	2022-09-03 23:31:55 +03:00
Vitaliy Filippov	2f13f347b0	Fix space stats in mon	2022-09-03 11:16:33 +03:00
Vitaliy Filippov	9453db0e99	Add a newer make-etcd.js	2022-09-03 02:04:21 +03:00
Vitaliy Filippov	a828a1233d	Remove old make-osd scripts	2022-09-03 02:04:21 +03:00
Vitaliy Filippov	9481456dfe	Automatically check whether to disable cache during prepare	2022-09-03 02:04:21 +03:00
Vitaliy Filippov	bd11db5d0a	Add vitastor-mon.service, vitastor.target, create user and log directory during package installation	2022-09-03 00:09:22 +03:00
Vitaliy Filippov	68ebe5993a	Fix partition reuse	2022-09-02 23:32:25 +03:00
Vitaliy Filippov	eecbfb66ce	Remove the old make-osd.sh script from packages	2022-09-02 20:35:15 +03:00
Vitaliy Filippov	a537db8909	Add documentation for the new "vitastor-disk" tool	2022-08-22 00:31:30 +03:00
Vitaliy Filippov	54ef2c389f	Followup to the "tried to overwrite" fix: also handle it in case of inmemory_meta == false	2022-08-21 01:28:29 +03:00
Vitaliy Filippov	153c73574a	Refactor blockstore_init_meta into slightly more obvious code	2022-08-21 01:21:13 +03:00
Vitaliy Filippov	d83580bd68	Fix "tried to overwrite non-zero metadata entry" when during a previous metadata flush writing new entry is completed, but zeroing out an old one isn't	2022-08-21 00:31:18 +03:00
Vitaliy Filippov	29b40aba93	Add write-meta command (for debug)	2022-08-20 23:56:57 +03:00
Vitaliy Filippov	a52f2b0e8f	Add write-journal command (for debug)	2022-08-20 14:05:53 +03:00
Vitaliy Filippov	1407db9c08	Fix vitastor-disk prepare bugs	2022-08-19 02:22:54 +03:00
Vitaliy Filippov	c0d5e83fb8	Run partprobe when partitions do not appear	2022-08-18 02:05:16 +03:00
Vitaliy Filippov	40d8d65188	Rewrite upgrade-simple to C++	2022-08-18 01:31:31 +03:00
Vitaliy Filippov	a16263e88c	Fix bugs in the upgrade script and in the udev startup script	2022-08-17 10:28:34 +03:00
Vitaliy Filippov	e62bab1b39	Add systemd unit for udev deployments	2022-08-15 00:23:26 +03:00
Vitaliy Filippov	cb4e3a118d	Fix warning	2022-08-15 00:18:21 +03:00
Vitaliy Filippov	b1e39b5dea	Split disk_tool.cpp into separate files	2022-08-14 02:37:01 +03:00
Vitaliy Filippov	1170319431	Finish vitastor-disk prepare in theory	2022-08-14 02:13:24 +03:00
Vitaliy Filippov	2e0a2221eb	vitastor-disk prepare: WIP second form command of the command	2022-08-12 01:58:28 +03:00
Vitaliy Filippov	5a10d135f3	Allow to configure block_size, bitmap_granularity and immediate_commit per-pool	2022-08-11 01:56:33 +03:00
Vitaliy Filippov	4c9aaa8a86	vitastor-disk prepare: implement first form of the command	2022-08-09 01:29:29 +03:00
Vitaliy Filippov	ae99ee6266	Rename base64.{cpp.h} to str_util	2022-07-31 01:12:37 +03:00
Vitaliy Filippov	5af75f7d78	Implement vitastor-cli and vitastor-disk --help <command>	2022-07-31 01:10:05 +03:00
Vitaliy Filippov	7dc6f10ea1	Add read-sb command	2022-07-28 00:14:23 +03:00
Vitaliy Filippov	6fde9950d6	Implement upgrade tool from "simple" units to superblock+udev deployments	2022-07-27 02:33:43 +03:00
Vitaliy Filippov	76dd0fdcea	Implement pre-exec command with on-start OSD checks	2022-07-24 15:09:45 +03:00
Vitaliy Filippov	5acc19bbd5	Implement systemctl start/stop and other commands	2022-07-23 02:18:40 +03:00
Vitaliy Filippov	d5ca4e1f90	Add exec-osd command	2022-07-22 02:17:24 +03:00
Vitaliy Filippov	67e04f789f	Add write-sb (superblock) command	2022-07-19 01:14:31 +03:00
Vitaliy Filippov	837407a84c	Add udev import command	2022-07-19 01:14:31 +03:00
Vitaliy Filippov	1fe5908899	WIP OSD activation from superblock	2022-07-17 02:14:50 +03:00
Vitaliy Filippov	dcc6d546be	Move simple-offsets into vitastor-disk, too	2022-07-15 02:19:35 +03:00
Vitaliy Filippov	85fa389557	Add a test for disk-tool resize	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	dfa433c63b	Add JSON format to dump-journal	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	cf487c95aa	Fix resizer	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	b10656ca09	Parse new disk params in disk_tool resizer	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	ea632367e9	Do not alter dsk.meta_offset/len to skip superblock	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	4d777c6729	Set journal/meta devices to data device explicitly instead of ""	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	0c404c5074	Use blockstore_disk in disk_tool	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	dfd80626bd	Extract disk opening functions to separate module	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	30907852c2	Use simple std::map for the config	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	078ed5b116	WIP Data area resize tool	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	73a363bf92	Rename some variables and constants	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	b0e86ca643	Merge dump-journal and dump-meta into the new "vitastor-disk" tool	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	8800afb649	Fix void* arithmetic again	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	c10c90f620	Swap cli.en.md and cli.ru.md contents O_o	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	e20cdd13b6	Fix simple-offsets return value	2022-07-15 01:38:30 +03:00
Vitaliy Filippov	d29b5d2d04	Add Russian translation of VNPL-1.1	2022-06-24 01:34:25 +03:00
Vitaliy Filippov	65b0e8e940	Fix typo in VNPL-1.1	2022-06-24 01:34:25 +03:00
Vitaliy Filippov	bce357e2a5	Do not read all metadata into memory when dumping	2022-06-13 01:26:30 +03:00
Vitaliy Filippov	0876ca09cd	Fix dumper includes and print format	2022-06-11 00:30:44 +03:00
Vitaliy Filippov	dac12d8a4c	Implement metadata dump tool	2022-06-10 18:50:09 +03:00
Vitaliy Filippov	1eec4407ab	Fix inode creation when /index/maxid is out of sync	2022-06-06 16:35:51 +03:00
huy	3b7c6dcac2	Fix volume creation from snapshots in Cinder driver	2022-06-06 15:46:13 +03:00
Vitaliy Filippov	342517d126	Fix typo	2022-06-05 00:45:02 +03:00
Vitaliy Filippov	675bc12a13	Add extern "C" for systems like Gentoo which miss it in jerasure includes	2022-06-05 00:33:38 +03:00