Compare commits

..

26 Commits

Author SHA1 Message Date
f69f801ffb Release 1.9.3
All checks were successful
Test / test_rebalance_verify_ec (push) Successful in 1m54s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m56s
Test / test_write_no_same (push) Successful in 8s
Test / test_switch_primary (push) Successful in 32s
Test / test_write (push) Successful in 31s
Test / test_write_xor (push) Successful in 36s
Test / test_heal_pg_size_2 (push) Successful in 2m15s
Test / test_heal_ec (push) Successful in 2m16s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m16s
Test / test_heal_csum_32k_dj (push) Successful in 2m17s
Test / test_heal_csum_4k_dmj (push) Successful in 2m11s
Test / test_heal_csum_32k (push) Successful in 2m18s
Test / test_heal_csum_4k_dj (push) Successful in 2m13s
Test / test_resize_auto (push) Successful in 8s
Test / test_resize (push) Successful in 14s
Test / test_osd_tags (push) Successful in 7s
Test / test_snapshot_pool2 (push) Successful in 15s
Test / test_enospc (push) Successful in 12s
Test / test_enospc_xor (push) Successful in 11s
Test / test_enospc_imm (push) Successful in 12s
Test / test_enospc_imm_xor (push) Successful in 13s
Test / test_scrub (push) Successful in 14s
Test / test_scrub_zero_osd_2 (push) Successful in 14s
Test / test_scrub_xor (push) Successful in 14s
Test / test_scrub_pg_size_3 (push) Successful in 15s
Test / test_scrub_ec (push) Successful in 13s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 17s
Test / test_nfs (push) Successful in 11s
Test / test_heal_csum_4k (push) Successful in 2m17s
- Support custom hybrid OSD creation (`vitastor-disk prepare --hybrid --fast-devices /dev/xxx,/dev/yyy`)
- Auto-change partition paths to /dev/disk/by-partuuid/ in `vitastor-disk prepare`
- Allow to select cached I/O in vitastor-disk commands
- Fix multiple bugs in vitastor-disk resize & add tests for them
- Fix vitastor-disk write-meta/write-journal in superblock-based mode writing it to an incorrect device
- Fix vitastor-disk prepare sometimes again not seeing new partitions
- Cleanup PG history and stats of deleted pools
- Fix "is already mounted" checks in CSI
2024-11-07 01:28:31 +03:00
af92cbdfcc Dynamic device size in test
All checks were successful
Test / test_rebalance_verify_ec (push) Successful in 1m52s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m53s
Test / test_write_no_same (push) Successful in 9s
Test / test_switch_primary (push) Successful in 33s
Test / test_write (push) Successful in 32s
Test / test_write_xor (push) Successful in 34s
Test / test_heal_pg_size_2 (push) Successful in 2m15s
Test / test_heal_ec (push) Successful in 2m16s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m18s
Test / test_heal_csum_32k_dj (push) Successful in 2m19s
Test / test_heal_csum_4k_dmj (push) Successful in 2m22s
Test / test_heal_csum_32k (push) Successful in 2m25s
Test / test_heal_csum_4k_dj (push) Successful in 2m19s
Test / test_resize_auto (push) Successful in 8s
Test / test_resize (push) Successful in 13s
Test / test_osd_tags (push) Successful in 7s
Test / test_snapshot_pool2 (push) Successful in 15s
Test / test_enospc (push) Successful in 12s
Test / test_enospc_imm (push) Successful in 10s
Test / test_enospc_xor (push) Successful in 14s
Test / test_enospc_imm_xor (push) Successful in 14s
Test / test_scrub (push) Successful in 13s
Test / test_scrub_zero_osd_2 (push) Successful in 12s
Test / test_scrub_xor (push) Successful in 15s
Test / test_scrub_pg_size_3 (push) Successful in 16s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 17s
Test / test_scrub_ec (push) Successful in 14s
Test / test_nfs (push) Successful in 11s
Test / test_heal_csum_4k (push) Successful in 2m17s
2024-11-06 14:16:58 +03:00
a775db10cc Also allow cached I/O in dsk.open_*() in disk_tool
Some checks failed
Test / test_rebalance_verify_ec (push) Successful in 1m52s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m53s
Test / test_write_no_same (push) Successful in 9s
Test / test_write (push) Successful in 30s
Test / test_switch_primary (push) Successful in 32s
Test / test_write_xor (push) Successful in 34s
Test / test_heal_pg_size_2 (push) Successful in 2m16s
Test / test_heal_ec (push) Successful in 2m16s
Test / test_heal_antietcd (push) Successful in 2m17s
Test / test_heal_csum_32k_dmj (push) Successful in 2m19s
Test / test_heal_csum_32k_dj (push) Successful in 2m17s
Test / test_heal_csum_32k (push) Successful in 2m17s
Test / test_heal_csum_4k_dmj (push) Successful in 2m17s
Test / test_heal_csum_4k_dj (push) Successful in 2m17s
Test / test_resize_auto (push) Failing after 8s
Test / test_resize (push) Successful in 13s
Test / test_osd_tags (push) Successful in 7s
Test / test_snapshot_pool2 (push) Successful in 15s
Test / test_enospc (push) Successful in 10s
Test / test_enospc_xor (push) Successful in 11s
Test / test_enospc_imm (push) Successful in 11s
Test / test_enospc_imm_xor (push) Successful in 13s
Test / test_scrub (push) Successful in 14s
Test / test_scrub_zero_osd_2 (push) Successful in 13s
Test / test_scrub_xor (push) Successful in 12s
Test / test_scrub_pg_size_3 (push) Successful in 15s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 16s
Test / test_scrub_ec (push) Successful in 14s
Test / test_nfs (push) Successful in 12s
Test / test_heal_csum_4k (push) Successful in 2m8s
2024-11-06 13:52:25 +03:00
eafce26049 Add resize and resize-auto tests
Some checks failed
Test / test_rebalance_verify_ec (push) Successful in 1m46s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m48s
Test / test_write_no_same (push) Successful in 7s
Test / test_switch_primary (push) Successful in 32s
Test / test_write (push) Successful in 30s
Test / test_write_xor (push) Successful in 34s
Test / test_heal_pg_size_2 (push) Successful in 2m16s
Test / test_heal_ec (push) Successful in 2m18s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m19s
Test / test_heal_csum_32k_dj (push) Successful in 2m21s
Test / test_heal_csum_4k_dmj (push) Successful in 2m13s
Test / test_heal_csum_32k (push) Successful in 2m20s
Test / test_resize (push) Failing after 12s
Test / test_resize_auto (push) Failing after 9s
Test / test_heal_csum_4k_dj (push) Successful in 2m19s
Test / test_osd_tags (push) Successful in 7s
Test / test_enospc (push) Successful in 10s
Test / test_snapshot_pool2 (push) Successful in 14s
Test / test_enospc_xor (push) Successful in 11s
Test / test_enospc_imm (push) Successful in 10s
Test / test_enospc_imm_xor (push) Successful in 13s
Test / test_scrub (push) Successful in 13s
Test / test_scrub_zero_osd_2 (push) Successful in 12s
Test / test_scrub_xor (push) Successful in 14s
Test / test_scrub_pg_size_3 (push) Successful in 15s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 15s
Test / test_scrub_ec (push) Successful in 13s
Test / test_nfs (push) Successful in 11s
Test / test_heal_csum_4k (push) Successful in 2m19s
2024-11-06 13:30:51 +03:00
625c74294f Support direct I/O 2024-11-06 13:30:12 +03:00
ef8c21ad6f Change %lu to %ju 2024-11-06 02:58:51 +03:00
2bb8e8999e Do not check length in "data alignment mismatch" 2024-11-06 02:58:26 +03:00
c2e7c28672 Fix calc_lengths data size recalc during auto-resize
All checks were successful
Test / test_dd (push) Successful in 12s
Test / test_root_node (push) Successful in 8s
Test / test_rebalance_verify_ec (push) Successful in 1m58s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m55s
Test / test_write_no_same (push) Successful in 8s
Test / test_switch_primary (push) Successful in 32s
Test / test_write (push) Successful in 31s
Test / test_write_xor (push) Successful in 34s
Test / test_heal_pg_size_2 (push) Successful in 2m14s
Test / test_heal_ec (push) Successful in 2m17s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m17s
Test / test_heal_csum_32k_dj (push) Successful in 2m17s
Test / test_heal_csum_32k (push) Successful in 2m18s
Test / test_heal_csum_4k_dmj (push) Successful in 2m26s
Test / test_heal_csum_4k_dj (push) Successful in 2m19s
Test / test_snapshot_pool2 (push) Successful in 14s
Test / test_osd_tags (push) Successful in 8s
Test / test_enospc (push) Successful in 11s
Test / test_enospc_xor (push) Successful in 12s
Test / test_enospc_imm (push) Successful in 12s
Test / test_enospc_imm_xor (push) Successful in 15s
Test / test_scrub (push) Successful in 15s
Test / test_scrub_zero_osd_2 (push) Successful in 14s
Test / test_scrub_xor (push) Successful in 15s
Test / test_scrub_pg_size_3 (push) Successful in 16s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 17s
Test / test_scrub_ec (push) Successful in 16s
Test / test_nfs (push) Successful in 10s
Test / test_heal_csum_4k (push) Successful in 2m13s
2024-11-06 02:27:17 +03:00
bd22beefb5 Auto-extend new_data_len if new_data_offset is changed too
All checks were successful
Test / test_dd (push) Successful in 11s
Test / test_root_node (push) Successful in 7s
Test / test_rebalance_verify_ec (push) Successful in 1m48s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m49s
Test / test_write_no_same (push) Successful in 7s
Test / test_switch_primary (push) Successful in 32s
Test / test_write (push) Successful in 31s
Test / test_write_xor (push) Successful in 34s
Test / test_heal_pg_size_2 (push) Successful in 2m14s
Test / test_heal_ec (push) Successful in 2m15s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m18s
Test / test_heal_csum_32k_dj (push) Successful in 2m16s
Test / test_heal_csum_4k_dmj (push) Successful in 2m12s
Test / test_heal_csum_32k (push) Successful in 2m18s
Test / test_heal_csum_4k_dj (push) Successful in 2m13s
Test / test_osd_tags (push) Successful in 8s
Test / test_snapshot_pool2 (push) Successful in 15s
Test / test_enospc (push) Successful in 10s
Test / test_enospc_xor (push) Successful in 12s
Test / test_enospc_imm (push) Successful in 11s
Test / test_enospc_imm_xor (push) Successful in 13s
Test / test_scrub (push) Successful in 14s
Test / test_scrub_zero_osd_2 (push) Successful in 13s
Test / test_scrub_xor (push) Successful in 14s
Test / test_scrub_pg_size_3 (push) Successful in 15s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 14s
Test / test_scrub_ec (push) Successful in 13s
Test / test_nfs (push) Successful in 11s
Test / test_heal_csum_4k (push) Successful in 2m22s
2024-11-06 02:13:30 +03:00
e7038ab99c Auto-change partition paths to /dev/disk/by-partuuid/
Some checks failed
Test / test_dd (push) Successful in 12s
Test / test_root_node (push) Successful in 8s
Test / test_rebalance_verify_ec (push) Successful in 1m54s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m57s
Test / test_write_no_same (push) Successful in 8s
Test / test_switch_primary (push) Successful in 33s
Test / test_write (push) Successful in 30s
Test / test_write_xor (push) Successful in 34s
Test / test_heal_pg_size_2 (push) Successful in 2m15s
Test / test_heal_ec (push) Failing after 2m18s
Test / test_heal_antietcd (push) Successful in 2m17s
Test / test_heal_csum_32k_dmj (push) Successful in 2m16s
Test / test_heal_csum_32k_dj (push) Successful in 2m17s
Test / test_heal_csum_32k (push) Successful in 2m16s
Test / test_heal_csum_4k_dmj (push) Successful in 2m16s
Test / test_heal_csum_4k_dj (push) Successful in 2m17s
Test / test_osd_tags (push) Successful in 8s
Test / test_snapshot_pool2 (push) Successful in 13s
Test / test_enospc (push) Successful in 10s
Test / test_enospc_xor (push) Successful in 12s
Test / test_enospc_imm (push) Successful in 12s
Test / test_enospc_imm_xor (push) Successful in 14s
Test / test_scrub (push) Successful in 13s
Test / test_scrub_zero_osd_2 (push) Successful in 12s
Test / test_scrub_xor (push) Successful in 13s
Test / test_scrub_pg_size_3 (push) Successful in 13s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 15s
Test / test_scrub_ec (push) Successful in 15s
Test / test_nfs (push) Successful in 13s
Test / test_heal_csum_4k (push) Successful in 2m18s
2024-11-06 01:04:05 +03:00
b6f75ebcfd Add missing I/O path description in english 2024-11-06 00:43:17 +03:00
9def199981 Auto-reduce new_data_len in resize
All checks were successful
Test / test_dd (push) Successful in 11s
Test / test_root_node (push) Successful in 8s
Test / test_rebalance_verify_ec (push) Successful in 1m45s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m46s
Test / test_write_no_same (push) Successful in 8s
Test / test_switch_primary (push) Successful in 33s
Test / test_write (push) Successful in 32s
Test / test_write_xor (push) Successful in 33s
Test / test_heal_pg_size_2 (push) Successful in 2m14s
Test / test_heal_ec (push) Successful in 2m17s
Test / test_heal_antietcd (push) Successful in 2m15s
Test / test_heal_csum_32k_dmj (push) Successful in 2m16s
Test / test_heal_csum_32k_dj (push) Successful in 2m17s
Test / test_heal_csum_32k (push) Successful in 2m17s
Test / test_heal_csum_4k_dmj (push) Successful in 2m17s
Test / test_heal_csum_4k_dj (push) Successful in 2m16s
Test / test_osd_tags (push) Successful in 8s
Test / test_snapshot_pool2 (push) Successful in 16s
Test / test_enospc (push) Successful in 11s
Test / test_enospc_xor (push) Successful in 12s
Test / test_enospc_imm (push) Successful in 9s
Test / test_enospc_imm_xor (push) Successful in 14s
Test / test_scrub_zero_osd_2 (push) Successful in 12s
Test / test_scrub (push) Successful in 14s
Test / test_scrub_xor (push) Successful in 14s
Test / test_scrub_pg_size_3 (push) Successful in 14s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 16s
Test / test_scrub_ec (push) Successful in 13s
Test / test_nfs (push) Successful in 13s
Test / test_heal_csum_4k (push) Successful in 2m18s
2024-11-05 02:57:11 +03:00
c72e8e649e Support test mode for vitastor-disk
All checks were successful
Test / test_dd (push) Successful in 11s
Test / test_rebalance_verify_ec (push) Successful in 1m51s
Test / test_root_node (push) Successful in 10s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m54s
Test / test_write_no_same (push) Successful in 7s
Test / test_switch_primary (push) Successful in 32s
Test / test_write (push) Successful in 32s
Test / test_write_xor (push) Successful in 33s
Test / test_heal_pg_size_2 (push) Successful in 2m15s
Test / test_heal_ec (push) Successful in 2m21s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m23s
Test / test_heal_csum_32k_dj (push) Successful in 2m18s
Test / test_heal_csum_32k (push) Successful in 2m18s
Test / test_heal_csum_4k_dmj (push) Successful in 2m18s
Test / test_osd_tags (push) Successful in 7s
Test / test_heal_csum_4k_dj (push) Successful in 2m18s
Test / test_snapshot_pool2 (push) Successful in 15s
Test / test_enospc (push) Successful in 10s
Test / test_enospc_imm (push) Successful in 10s
Test / test_enospc_xor (push) Successful in 13s
Test / test_enospc_imm_xor (push) Successful in 12s
Test / test_scrub (push) Successful in 12s
Test / test_scrub_zero_osd_2 (push) Successful in 13s
Test / test_scrub_xor (push) Successful in 14s
Test / test_scrub_pg_size_3 (push) Successful in 16s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 14s
Test / test_scrub_ec (push) Successful in 14s
Test / test_nfs (push) Successful in 12s
Test / test_heal_csum_4k (push) Successful in 2m19s
2024-11-05 02:43:55 +03:00
8bdb3e8786 Write meta/journal to correct device when used in superblock mode 2024-11-05 02:43:55 +03:00
a87e236c70 Fix resize --data-size, particularly when expanding the device
Some checks failed
Test / test_root_node (push) Successful in 8s
Test / test_dd (push) Successful in 12s
Test / test_rebalance_verify_ec (push) Successful in 1m47s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m47s
Test / test_write_no_same (push) Successful in 7s
Test / test_write (push) Successful in 30s
Test / test_switch_primary (push) Successful in 32s
Test / test_write_xor (push) Successful in 35s
Test / test_heal_pg_size_2 (push) Successful in 2m16s
Test / test_heal_ec (push) Successful in 2m14s
Test / test_heal_antietcd (push) Successful in 2m15s
Test / test_heal_csum_32k_dmj (push) Successful in 2m15s
Test / test_heal_csum_32k_dj (push) Failing after 2m24s
Test / test_heal_csum_32k (push) Successful in 2m16s
Test / test_heal_csum_4k_dmj (push) Successful in 2m17s
Test / test_osd_tags (push) Successful in 7s
Test / test_heal_csum_4k_dj (push) Successful in 2m19s
Test / test_snapshot_pool2 (push) Successful in 14s
Test / test_enospc (push) Successful in 9s
Test / test_enospc_imm (push) Successful in 9s
Test / test_enospc_xor (push) Successful in 12s
Test / test_enospc_imm_xor (push) Successful in 13s
Test / test_scrub (push) Successful in 12s
Test / test_scrub_zero_osd_2 (push) Successful in 11s
Test / test_scrub_xor (push) Successful in 14s
Test / test_scrub_pg_size_3 (push) Successful in 14s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 14s
Test / test_scrub_ec (push) Successful in 16s
Test / test_nfs (push) Successful in 12s
Test / test_heal_csum_4k (push) Successful in 2m17s
2024-11-04 18:55:03 +03:00
16f67cf6f1 Fix missing metadata checksums after resize
All checks were successful
Test / test_dd (push) Successful in 12s
Test / test_root_node (push) Successful in 8s
Test / test_rebalance_verify_ec (push) Successful in 1m47s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m50s
Test / test_write_no_same (push) Successful in 7s
Test / test_write (push) Successful in 30s
Test / test_switch_primary (push) Successful in 34s
Test / test_write_xor (push) Successful in 33s
Test / test_heal_pg_size_2 (push) Successful in 2m22s
Test / test_heal_ec (push) Successful in 2m20s
Test / test_heal_antietcd (push) Successful in 2m19s
Test / test_heal_csum_32k_dmj (push) Successful in 2m17s
Test / test_heal_csum_32k_dj (push) Successful in 2m26s
Test / test_heal_csum_4k_dj (push) Successful in 2m21s
Test / test_heal_csum_32k (push) Successful in 2m25s
Test / test_heal_csum_4k_dmj (push) Successful in 2m26s
Test / test_osd_tags (push) Successful in 8s
Test / test_snapshot_pool2 (push) Successful in 14s
Test / test_enospc (push) Successful in 11s
Test / test_enospc_xor (push) Successful in 11s
Test / test_enospc_imm (push) Successful in 12s
Test / test_enospc_imm_xor (push) Successful in 13s
Test / test_scrub (push) Successful in 14s
Test / test_scrub_zero_osd_2 (push) Successful in 13s
Test / test_scrub_xor (push) Successful in 13s
Test / test_scrub_pg_size_3 (push) Successful in 14s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 16s
Test / test_scrub_ec (push) Successful in 13s
Test / test_nfs (push) Successful in 10s
Test / test_heal_csum_4k (push) Successful in 2m16s
2024-11-04 18:36:35 +03:00
56de4a520d Support custom hybrid OSD creation (--hybrid --fast-devices /dev/xxx,/dev/yyy)
All checks were successful
Test / test_dd (push) Successful in 13s
Test / test_root_node (push) Successful in 8s
Test / test_rebalance_verify_ec (push) Successful in 1m41s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m42s
Test / test_write_no_same (push) Successful in 7s
Test / test_switch_primary (push) Successful in 32s
Test / test_write (push) Successful in 29s
Test / test_write_xor (push) Successful in 33s
Test / test_heal_pg_size_2 (push) Successful in 2m15s
Test / test_heal_ec (push) Successful in 2m15s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m12s
Test / test_heal_csum_32k_dj (push) Successful in 2m18s
Test / test_heal_csum_32k (push) Successful in 2m16s
Test / test_heal_csum_4k_dmj (push) Successful in 2m17s
Test / test_heal_csum_4k_dj (push) Successful in 2m16s
Test / test_osd_tags (push) Successful in 7s
Test / test_snapshot_pool2 (push) Successful in 14s
Test / test_enospc (push) Successful in 10s
Test / test_enospc_xor (push) Successful in 12s
Test / test_enospc_imm (push) Successful in 11s
Test / test_enospc_imm_xor (push) Successful in 13s
Test / test_scrub (push) Successful in 14s
Test / test_scrub_zero_osd_2 (push) Successful in 14s
Test / test_scrub_xor (push) Successful in 13s
Test / test_scrub_pg_size_3 (push) Successful in 16s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 15s
Test / test_scrub_ec (push) Successful in 13s
Test / test_nfs (push) Successful in 11s
Test / test_heal_csum_4k (push) Successful in 2m15s
2024-11-04 17:52:29 +03:00
adca162278 Note that osd_per_disk is also incompatible
All checks were successful
Test / test_dd (push) Successful in 12s
Test / test_root_node (push) Successful in 8s
Test / test_rebalance_verify_ec (push) Successful in 1m43s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m45s
Test / test_write_no_same (push) Successful in 7s
Test / test_switch_primary (push) Successful in 31s
Test / test_write (push) Successful in 30s
Test / test_write_xor (push) Successful in 35s
Test / test_heal_pg_size_2 (push) Successful in 2m14s
Test / test_heal_ec (push) Successful in 2m17s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m25s
Test / test_heal_csum_32k_dj (push) Successful in 2m19s
Test / test_heal_csum_32k (push) Successful in 2m18s
Test / test_heal_csum_4k_dmj (push) Successful in 2m17s
Test / test_osd_tags (push) Successful in 8s
Test / test_snapshot_pool2 (push) Successful in 14s
Test / test_heal_csum_4k_dj (push) Successful in 2m16s
Test / test_enospc (push) Successful in 11s
Test / test_enospc_imm (push) Successful in 10s
Test / test_enospc_xor (push) Successful in 13s
Test / test_enospc_imm_xor (push) Successful in 12s
Test / test_scrub (push) Successful in 12s
Test / test_scrub_zero_osd_2 (push) Successful in 14s
Test / test_scrub_xor (push) Successful in 14s
Test / test_scrub_pg_size_3 (push) Successful in 13s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 16s
Test / test_scrub_ec (push) Successful in 14s
Test / test_nfs (push) Successful in 9s
Test / test_heal_csum_4k (push) Successful in 2m18s
2024-11-04 15:20:01 +03:00
490b314d72 Rework & fix new partition waiting code
All checks were successful
Test / test_dd (push) Successful in 11s
Test / test_root_node (push) Successful in 8s
Test / test_rebalance_verify_ec (push) Successful in 1m46s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m48s
Test / test_write_no_same (push) Successful in 7s
Test / test_switch_primary (push) Successful in 32s
Test / test_write (push) Successful in 30s
Test / test_write_xor (push) Successful in 34s
Test / test_heal_pg_size_2 (push) Successful in 2m14s
Test / test_heal_ec (push) Successful in 2m15s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m14s
Test / test_heal_csum_4k_dmj (push) Successful in 2m11s
Test / test_heal_csum_32k_dj (push) Successful in 2m24s
Test / test_heal_csum_32k (push) Successful in 2m18s
Test / test_osd_tags (push) Successful in 7s
Test / test_heal_csum_4k_dj (push) Successful in 2m18s
Test / test_snapshot_pool2 (push) Successful in 13s
Test / test_enospc (push) Successful in 11s
Test / test_enospc_imm (push) Successful in 9s
Test / test_enospc_xor (push) Successful in 13s
Test / test_enospc_imm_xor (push) Successful in 12s
Test / test_scrub_zero_osd_2 (push) Successful in 11s
Test / test_scrub (push) Successful in 13s
Test / test_scrub_xor (push) Successful in 14s
Test / test_scrub_pg_size_3 (push) Successful in 15s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 17s
Test / test_scrub_ec (push) Successful in 14s
Test / test_nfs (push) Successful in 13s
Test / test_heal_csum_4k (push) Successful in 2m11s
2024-11-04 15:16:30 +03:00
9f52074e1e Delete PG history and stats of deleted pools
All checks were successful
Test / test_dd (push) Successful in 11s
Test / test_rebalance_verify_ec (push) Successful in 1m38s
Test / test_root_node (push) Successful in 10s
Test / test_rebalance_verify_ec_imm (push) Successful in 1m40s
Test / test_write_no_same (push) Successful in 9s
Test / test_switch_primary (push) Successful in 32s
Test / test_write (push) Successful in 30s
Test / test_write_xor (push) Successful in 34s
Test / test_heal_pg_size_2 (push) Successful in 2m13s
Test / test_heal_ec (push) Successful in 2m16s
Test / test_heal_antietcd (push) Successful in 2m16s
Test / test_heal_csum_32k_dmj (push) Successful in 2m18s
Test / test_heal_csum_32k_dj (push) Successful in 2m17s
Test / test_heal_csum_32k (push) Successful in 2m17s
Test / test_heal_csum_4k_dmj (push) Successful in 2m17s
Test / test_heal_csum_4k_dj (push) Successful in 2m16s
Test / test_osd_tags (push) Successful in 8s
Test / test_snapshot_pool2 (push) Successful in 14s
Test / test_enospc (push) Successful in 10s
Test / test_enospc_xor (push) Successful in 13s
Test / test_enospc_imm (push) Successful in 11s
Test / test_enospc_imm_xor (push) Successful in 11s
Test / test_scrub_zero_osd_2 (push) Successful in 12s
Test / test_scrub (push) Successful in 14s
Test / test_scrub_xor (push) Successful in 13s
Test / test_scrub_pg_size_3 (push) Successful in 15s
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 15s
Test / test_scrub_ec (push) Successful in 13s
Test / test_nfs (push) Successful in 11s
Test / test_heal_csum_4k (push) Successful in 2m17s
2024-11-01 02:38:31 +03:00
2b3e877546 Add notes about vitastor-disk in disable_data_fsync 2024-11-01 02:38:18 +03:00
01d55e5420 Merge pull request #64 from 0x00ace/fio_version_fix
use fio 3.35-1 for AlmaLinux 9
2024-10-31 11:55:40 +03:00
f5aa5cfdfe Fix "is already mounted" checks in CSI 2024-10-26 14:06:21 +03:00
2826bb9e7e Add more logging to CSI 2024-10-24 02:07:55 +03:00
30d1ad0f66 Add Intel D5-P4320 2024-10-22 23:22:48 +03:00
ace
b85dab8583 use fio 3.35-1 for AlmaLinux 9 2024-05-18 21:17:16 +03:00
41 changed files with 741 additions and 306 deletions

View File

@@ -22,7 +22,7 @@ RUN apt-get update
RUN apt-get -y install etcd qemu-system-x86 qemu-block-extra qemu-utils fio libasan5 \
liburing1 liburing-dev libgoogle-perftools-dev devscripts libjerasure-dev cmake libibverbs-dev libisal-dev
RUN apt-get -y build-dep fio qemu=`dpkg -s qemu-system-x86|grep ^Version:|awk '{print $2}'`
RUN apt-get -y install jq lp-solve sudo nfs-common
RUN apt-get update && apt-get -y install jq lp-solve sudo nfs-common fdisk parted
RUN apt-get --download-only source fio qemu=`dpkg -s qemu-system-x86|grep ^Version:|awk '{print $2}'`
RUN set -ex; \

View File

@@ -828,6 +828,42 @@ jobs:
echo ""
done
test_resize:
runs-on: ubuntu-latest
needs: build
container: ${{env.TEST_IMAGE}}:${{github.sha}}
steps:
- name: Run test
id: test
timeout-minutes: 3
run: /root/vitastor/tests/test_resize.sh
- name: Print logs
if: always() && steps.test.outcome == 'failure'
run: |
for i in /root/vitastor/testdata/*.log /root/vitastor/testdata/*.txt; do
echo "-------- $i --------"
cat $i
echo ""
done
test_resize_auto:
runs-on: ubuntu-latest
needs: build
container: ${{env.TEST_IMAGE}}:${{github.sha}}
steps:
- name: Run test
id: test
timeout-minutes: 3
run: /root/vitastor/tests/test_resize_auto.sh
- name: Print logs
if: always() && steps.test.outcome == 'failure'
run: |
for i in /root/vitastor/testdata/*.log /root/vitastor/testdata/*.txt; do
echo "-------- $i --------"
cat $i
echo ""
done
test_snapshot_pool2:
runs-on: ubuntu-latest
needs: build

View File

@@ -2,6 +2,6 @@ cmake_minimum_required(VERSION 2.8.12)
project(vitastor)
set(VITASTOR_VERSION "1.9.2")
set(VITASTOR_VERSION "1.9.3")
add_subdirectory(src)

View File

@@ -1,4 +1,4 @@
VITASTOR_VERSION ?= v1.9.2
VITASTOR_VERSION ?= v1.9.3
all: build push

View File

@@ -49,7 +49,7 @@ spec:
capabilities:
add: ["SYS_ADMIN"]
allowPrivilegeEscalation: true
image: vitalif/vitastor-csi:v1.9.2
image: vitalif/vitastor-csi:v1.9.3
args:
- "--node=$(NODE_ID)"
- "--endpoint=$(CSI_ENDPOINT)"

View File

@@ -121,7 +121,7 @@ spec:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
image: vitalif/vitastor-csi:v1.9.2
image: vitalif/vitastor-csi:v1.9.3
args:
- "--node=$(NODE_ID)"
- "--endpoint=$(CSI_ENDPOINT)"

View File

@@ -5,7 +5,7 @@ package vitastor
const (
vitastorCSIDriverName = "csi.vitastor.io"
vitastorCSIDriverVersion = "1.9.2"
vitastorCSIDriverVersion = "1.9.3"
)
// Config struct fills the parameters of request or user input

View File

@@ -8,11 +8,9 @@ import (
"encoding/json"
"fmt"
"strings"
"bytes"
"strconv"
"time"
"os"
"os/exec"
"io/ioutil"
"github.com/kubernetes-csi/csi-lib-utils/protosanitizer"
@@ -114,22 +112,6 @@ func GetConnectionParams(params map[string]string) (map[string]string, error)
return ctxVars, nil
}
func system(program string, args ...string) ([]byte, []byte, error)
{
klog.Infof("Running "+program+" "+strings.Join(args, " "))
c := exec.Command(program, args...)
var stdout, stderr bytes.Buffer
c.Stdout, c.Stderr = &stdout, &stderr
err := c.Run()
if (err != nil)
{
stdoutStr, stderrStr := string(stdout.Bytes()), string(stderr.Bytes())
klog.Errorf(program+" "+strings.Join(args, " ")+" failed: %s, status %s\n", stdoutStr+stderrStr, err)
return nil, nil, status.Error(codes.Internal, stdoutStr+stderrStr+" (status "+err.Error()+")")
}
return stdout.Bytes(), stderr.Bytes(), nil
}
func invokeCLI(ctxVars map[string]string, args []string) ([]byte, error)
{
if (ctxVars["configPath"] != "")

View File

@@ -227,9 +227,14 @@ func (ns *NodeServer) NodeStageVolume(ctx context.Context, req *csi.NodeStageVol
isBlock := req.GetVolumeCapability().GetBlock() != nil
// Check that it's not already mounted
_, err = mount.IsNotMountPoint(ns.mounter, targetPath)
notmnt, err := mount.IsNotMountPoint(ns.mounter, targetPath)
if (err == nil)
{
if (!notmnt)
{
klog.Errorf("target path %s is already mounted", targetPath)
return nil, fmt.Errorf("target path %s is already mounted", targetPath)
}
var finfo os.FileInfo
finfo, err = os.Stat(targetPath)
if (err != nil)
@@ -300,6 +305,7 @@ func (ns *NodeServer) NodeStageVolume(ctx context.Context, req *csi.NodeStageVol
diskMounter := &mount.SafeFormatAndMount{Interface: ns.mounter, Exec: utilexec.New()}
if (isBlock)
{
klog.Infof("bind-mounting %s to %s", devicePath, targetPath)
err = diskMounter.Mount(devicePath, targetPath, "", []string{"bind"})
}
else
@@ -329,39 +335,40 @@ func (ns *NodeServer) NodeStageVolume(ctx context.Context, req *csi.NodeStageVol
readOnly := Contains(opt, "ro")
if (existingFormat == "" && !readOnly)
{
var cmdOut []byte
switch fsType
{
case "ext4":
args := []string{"-m0", "-Enodiscard,lazy_itable_init=1,lazy_journal_init=1", devicePath}
cmdOut, err = diskMounter.Exec.Command("mkfs.ext4", args...).CombinedOutput()
_, err = systemCombined("mkfs.ext4", args...)
case "xfs":
cmdOut, err = diskMounter.Exec.Command("mkfs.xfs", "-K", devicePath).CombinedOutput()
_, err = systemCombined("mkfs.xfs", "-K", devicePath)
}
if (err != nil)
{
klog.Errorf("failed to run mkfs error: %v, output: %v", err, string(cmdOut))
goto unmap
}
}
klog.Infof("formatting and mounting %s to %s with FS %s, options: %v", devicePath, targetPath, fsType, opt)
err = diskMounter.FormatAndMount(devicePath, targetPath, fsType, opt)
if (err == nil)
{
klog.Infof("successfully mounted %s to %s", devicePath, targetPath)
}
// Try to run online resize on mount.
// FIXME: Implement online resize. It requires online resize support in vitastor-nbd.
if (err == nil && existingFormat != "" && !readOnly)
{
var cmdOut []byte
switch (fsType)
{
case "ext4":
cmdOut, err = diskMounter.Exec.Command("resize2fs", devicePath).CombinedOutput()
_, err = systemCombined("resize2fs", devicePath)
case "xfs":
cmdOut, err = diskMounter.Exec.Command("xfs_growfs", devicePath).CombinedOutput()
_, err = systemCombined("xfs_growfs", devicePath)
}
if (err != nil)
{
klog.Errorf("failed to run resizefs error: %v, output: %v", err, string(cmdOut))
goto unmap
}
}
@@ -481,15 +488,20 @@ func (ns *NodeServer) NodePublishVolume(ctx context.Context, req *csi.NodePublis
isBlock := req.GetVolumeCapability().GetBlock() != nil
// Check that stagingTargetPath is mounted
_, err = mount.IsNotMountPoint(ns.mounter, stagingTargetPath)
notmnt, err := mount.IsNotMountPoint(ns.mounter, stagingTargetPath)
if (err != nil)
{
klog.Errorf("staging path %v is not mounted: %v", stagingTargetPath, err)
return nil, fmt.Errorf("staging path %v is not mounted: %v", stagingTargetPath, err)
klog.Errorf("staging path %v is not mounted: %w", stagingTargetPath, err)
return nil, fmt.Errorf("staging path %v is not mounted: %w", stagingTargetPath, err)
}
else if (notmnt)
{
klog.Errorf("staging path %v is not mounted", stagingTargetPath)
return nil, fmt.Errorf("staging path %v is not mounted", stagingTargetPath)
}
// Check that targetPath is not already mounted
_, err = mount.IsNotMountPoint(ns.mounter, targetPath)
notmnt, err = mount.IsNotMountPoint(ns.mounter, targetPath)
if (err != nil)
{
if (os.IsNotExist(err))
@@ -524,6 +536,11 @@ func (ns *NodeServer) NodePublishVolume(ctx context.Context, req *csi.NodePublis
return nil, err
}
}
else if (!notmnt)
{
klog.Errorf("target path %s is already mounted", targetPath)
return nil, fmt.Errorf("target path %s is already mounted", targetPath)
}
execArgs := []string{"--bind", stagingTargetPath, targetPath}
if (req.GetReadonly())

View File

@@ -4,6 +4,7 @@
package vitastor
import (
"bytes"
"errors"
"encoding/json"
"fmt"
@@ -15,6 +16,8 @@ import (
"syscall"
"k8s.io/klog"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
func Contains(list []string, s string) bool
@@ -73,6 +76,10 @@ func checkVduseSupport() bool
" For VDUSE you need at least Linux 5.15 and the following kernel modules: vdpa, virtio-vdpa, vduse.",
)
}
else
{
klog.Infof("VDUSE support enabled successfully")
}
return vduse
}
@@ -97,6 +104,7 @@ func mapNbd(volName string, ctxVars map[string]string, readonly bool) (string, e
{
return "", fmt.Errorf("vitastor-nbd did not return the name of NBD device. output: %s", stderr)
}
klog.Infof("Attached volume %s via NBD as %s", volName, dev)
return dev, err
}
@@ -217,6 +225,7 @@ func mapVduse(stateDir string, volName string, ctxVars map[string]string, readon
err = os.WriteFile(stateFile, stateJSON, 0600)
if (err == nil)
{
klog.Infof("Attached volume %s via VDUSE as %s (VDPA ID %s)", volName, blockdev, vdpaId)
return blockdev, vdpaId, nil
}
}
@@ -299,3 +308,35 @@ func unmapVduseById(stateDir, vdpaId string)
os.Remove(pidFile)
}
}
func system(program string, args ...string) ([]byte, []byte, error)
{
klog.Infof("Running "+program+" "+strings.Join(args, " "))
c := exec.Command(program, args...)
var stdout, stderr bytes.Buffer
c.Stdout, c.Stderr = &stdout, &stderr
err := c.Run()
if (err != nil)
{
stdoutStr, stderrStr := string(stdout.Bytes()), string(stderr.Bytes())
klog.Errorf(program+" "+strings.Join(args, " ")+" failed: %s\nOutput:\n%s", err, stdoutStr+stderrStr)
return nil, nil, status.Error(codes.Internal, stdoutStr+stderrStr+" (status "+err.Error()+")")
}
return stdout.Bytes(), stderr.Bytes(), nil
}
func systemCombined(program string, args ...string) ([]byte, error)
{
klog.Infof("Running "+program+" "+strings.Join(args, " "))
c := exec.Command(program, args...)
var out bytes.Buffer
c.Stdout, c.Stderr = &out, &out
err := c.Run()
if (err != nil)
{
outStr := string(out.Bytes())
klog.Errorf(program+" "+strings.Join(args, " ")+" failed: %s, status %s\n", outStr, err)
return nil, status.Error(codes.Internal, outStr+" (status "+err.Error()+")")
}
return out.Bytes(), nil
}

2
debian/changelog vendored
View File

@@ -1,4 +1,4 @@
vitastor (1.9.2-1) unstable; urgency=medium
vitastor (1.9.3-1) unstable; urgency=medium
* Bugfixes

View File

@@ -118,12 +118,13 @@ Physical block size of the journal device. Must be a multiple of
- Type: boolean
- Default: false
Do not issue fsyncs to the data device, i.e. do not flush its cache.
Safe ONLY if your data device has write-through cache. If you disable
the cache yourself using `hdparm` or `scsi_disk/cache_type` then make sure
that the cache disable command is run every time before starting Vitastor
OSD, for example, in the systemd unit. See also `immediate_commit` option
for the instructions to disable cache and how to benefit from it.
Do not issue fsyncs to the data device, i.e. do not force it to flush cache.
Safe ONLY if your data device has write-through cache or if write-back
cache is disabled. If you disable drive cache manually with `hdparm` or
writing to `/sys/.../scsi_disk/cache_type` then make sure that you do it
every time before starting Vitastor OSD (vitastor-disk does it automatically).
See also [immediate_commit](layout-cluster.en.md#immediate_commit)
for information about how to benefit from disabled cache.
## disable_meta_fsync
@@ -171,8 +172,7 @@ size, it actually has to write the whole 4 KB sector.
Because of this it can actually be beneficial to use SSDs which work well
with 512 byte sectors and use 512 byte disk_alignment, journal_block_size
and meta_block_size. But the only SSD that may fit into this category is
Intel Optane (probably, not tested yet).
and meta_block_size. But at the moment, no such SSDs are known...
Clients don't need to be aware of disk_alignment, so it's not required to
put a modified value into etcd key /vitastor/config/global.

View File

@@ -122,13 +122,14 @@ SSD-диске, иначе производительность пострада
- Тип: булево (да/нет)
- Значение по умолчанию: false
Не отправлять fsync-и устройству данных, т.е. не сбрасывать его кэш.
Не отправлять fsync-и устройству данных, т.е. не заставлять его сбрасывать кэш.
Безопасно, ТОЛЬКО если ваше устройство данных имеет кэш со сквозной
записью (write-through). Если вы отключаете кэш через `hdparm` или
`scsi_disk/cache_type`, то удостоверьтесь, что команда отключения кэша
выполняется перед каждым запуском Vitastor OSD, например, в systemd unit-е.
Смотрите также опцию `immediate_commit` для инструкций по отключению кэша
и о том, как из этого извлечь выгоду.
записью (write-through) или если кэш с отложенной записью (write-back) отключён.
Если вы отключаете кэш вручную через `hdparm` или запись в `/sys/.../scsi_disk/cache_type`,
то удостоверьтесь, что вы делаете это каждый раз перед запуском Vitastor OSD
(vitastor-disk делает это автоматически). Смотрите также опцию
[immediate_commit](layout-cluster.ru.md#immediate_commit) для информации о том,
как извлечь выгоду из отключённого кэша.
## disable_meta_fsync
@@ -179,9 +180,8 @@ SSD и HDD диски используют 4 КБ физические сект
Поэтому, на самом деле, может быть выгодно найти SSD, хорошо работающие с
меньшими, 512-байтными, блоками и использовать 512-байтные disk_alignment,
journal_block_size и meta_block_size. Однако единственные SSD, которые
теоретически могут попасть в эту категорию - это Intel Optane (но и это
пока не проверялось автором).
journal_block_size и meta_block_size. Однако на данный момент такие SSD
не известны...
Клиентам не обязательно знать про disk_alignment, так что помещать значение
этого параметра в etcd в /vitastor/config/global не нужно.

View File

@@ -110,20 +110,22 @@
type: bool
default: false
info: |
Do not issue fsyncs to the data device, i.e. do not flush its cache.
Safe ONLY if your data device has write-through cache. If you disable
the cache yourself using `hdparm` or `scsi_disk/cache_type` then make sure
that the cache disable command is run every time before starting Vitastor
OSD, for example, in the systemd unit. See also `immediate_commit` option
for the instructions to disable cache and how to benefit from it.
Do not issue fsyncs to the data device, i.e. do not force it to flush cache.
Safe ONLY if your data device has write-through cache or if write-back
cache is disabled. If you disable drive cache manually with `hdparm` or
writing to `/sys/.../scsi_disk/cache_type` then make sure that you do it
every time before starting Vitastor OSD (vitastor-disk does it automatically).
See also [immediate_commit](layout-cluster.en.md#immediate_commit)
for information about how to benefit from disabled cache.
info_ru: |
Не отправлять fsync-и устройству данных, т.е. не сбрасывать его кэш.
Не отправлять fsync-и устройству данных, т.е. не заставлять его сбрасывать кэш.
Безопасно, ТОЛЬКО если ваше устройство данных имеет кэш со сквозной
записью (write-through). Если вы отключаете кэш через `hdparm` или
`scsi_disk/cache_type`, то удостоверьтесь, что команда отключения кэша
выполняется перед каждым запуском Vitastor OSD, например, в systemd unit-е.
Смотрите также опцию `immediate_commit` для инструкций по отключению кэша
и о том, как из этого извлечь выгоду.
записью (write-through) или если кэш с отложенной записью (write-back) отключён.
Если вы отключаете кэш вручную через `hdparm` или запись в `/sys/.../scsi_disk/cache_type`,
то удостоверьтесь, что вы делаете это каждый раз перед запуском Vitastor OSD
(vitastor-disk делает это автоматически). Смотрите также опцию
[immediate_commit](layout-cluster.ru.md#immediate_commit) для информации о том,
как извлечь выгоду из отключённого кэша.
- name: disable_meta_fsync
type: bool
default: false
@@ -179,8 +181,7 @@
Because of this it can actually be beneficial to use SSDs which work well
with 512 byte sectors and use 512 byte disk_alignment, journal_block_size
and meta_block_size. But the only SSD that may fit into this category is
Intel Optane (probably, not tested yet).
and meta_block_size. But at the moment, no such SSDs are known...
Clients don't need to be aware of disk_alignment, so it's not required to
put a modified value into etcd key /vitastor/config/global.
@@ -198,9 +199,8 @@
Поэтому, на самом деле, может быть выгодно найти SSD, хорошо работающие с
меньшими, 512-байтными, блоками и использовать 512-байтные disk_alignment,
journal_block_size и meta_block_size. Однако единственные SSD, которые
теоретически могут попасть в эту категорию - это Intel Optane (но и это
пока не проверялось автором).
journal_block_size и meta_block_size. Однако на данный момент такие SSD
не известны...
Клиентам не обязательно знать про disk_alignment, так что помещать значение
этого параметра в etcd в /vitastor/config/global не нужно.

View File

@@ -11,14 +11,140 @@
- [Differences from Ceph](#differences-from-ceph)
- [Implementation Principles](#implementation-principles)
## Server-side components
- **OSD** (Object Storage Daemon) is a process that directly works with the disk, stores data
and serves read/write requests. One OSD serves one disk (or one partition). OSDs talk to etcd
and to each other — they receive cluster state from etcd, and send read/write requests for
secondary copies of data to other OSDs.
- **etcd** — clustered key/value database, used as a reliable storage for configuration
and high-level cluster state. Etcd is the component that prevents splitbrain in the cluster.
Data blocks are not stored in etcd, etcd doesn't participate in data write or read path.
- **Монитор** — a separate node.js based daemon which monitors the cluster, calculates
required configuration changes and saves them to etcd, thus commanding OSDs to apply these
changes. Monitor also aggregates cluster statistics. OSD don't talk to monitor, monitor
only sends and receives data from etcd.
## Basic concepts
- OSD (Object Storage Daemon) is a process that stores data and serves read/write requests.
- PG (Placement Group) is a "shard" of the cluster, group of data stored on one set of replicas.
- Pool is a container for data that has equal redundancy scheme and placement rules.
- Monitor is a separate daemon that watches cluster state and handles failures.
- Failure Domain is a group of OSDs that you allow to fail. It's "host" by default.
- Placement Tree groups OSDs in a hierarchy to later split them into Failure Domains.
- **Pool** is a container for data that has equal redundancy scheme and disk placement rules.
- **PG (Placement Group)** is a "shard" of the cluster, subdivision unit that has its own
set of OSDs for data storage.
- **Failure Domain** is a group of OSDs, from the simultaneous failure of which you are
protected by Vitastor. Default failure domain is "host" (server), but you choose a
larger (for example, a rack of servers) or smaller (a single drive) failure domain
for every pool.
- **Placement Tree** (similar to Ceph CRUSH Tree) groups OSDs in a hierarchy to later
split them into Failure Domains.
## Client-side components
- **Client library** incapsulates client I/O logic. Client library connects to etcd and to all OSDs,
receives cluster state from etcd, sends read and write requests directly to all OSDs. Due
to the symmetric distributed architecture, all data blocks (each 128 KB by default) are placed
to different OSDs, but clients always knows where each data block is stored and connects directly
to the right OSD.
All other client-side components are based on the client library:
- **[vitastor-cli](../usage/cli.en.md)** — command-line utility for cluster management.
Allows to view cluster state, manage pools and images, i.e. create, modify and remove
virtual disks, their snapshots and clones.
- **[QEMU driver](../usage/qemu.en.md)** — pluggable QEMU module allowing QEMU/KVM virtual
machines work with virtual Vitastor disks directly from userspace through the client library,
without the need to attach disks as kernel block devices. However, if you want to attach
disks, you can also do that with the same driver and [VDUSE](../usage/qemu.en.md#vduse).
- **[vitastor-nbd](../usage/nbd.en.md)** — utility that allows to attach Vitastor disks as
kernel block devices using NBD (Network Block Device), which works more like "BUSE"
(Block Device In Userspace). Vitastor doesn't have Linux kernel modules for the same task
(at least by now). NBD is an older, non-recommended way to attach disks — you should use
VDUSE whenever you can.
- **[CSI driver](../installation/kubernetes.en.md)** — driver for attaching Vitastor images
as Kubernetes persistent volumes. Works through VDUSE (when available) or NBD — images are
attached as kernel block devices and mounted into containers.
- **Drivers for Proxmox, OpenStack and so on** — pluggable modules for corresponding systems,
allowing to use Vitastor as storage in them.
- **[vitastor-nfs](../usage/nfs.en.md)** — NFS 3.0 server allowing export of two file system variants:
the first is a simplified pseudo-FS for file-based access to Vitastor block images (for non-QEMU
hypervisors with NFS support), the second is **VitastorFS**, full-featured clustered POSIX FS.
Both variants support parallel access from multiple vitastor-nfs servers. In fact, you are
not required to setup separate NFS servers at all and use vitastor-nfs mount command on every
client node — it starts the NFS server and mounts the FS locally.
- **[fio driver](../usage/fio.en.md)** — pluggable module for fio disk benchmarking tool for
running performance tests on your Vitastor cluster.
- **vitastor-kv** — client for a key-value DB working over shared block volumes (usual
vitastor images). VitastorFS metadata is stored in vitastor-kv.
## Additional utilities
- **vitastor-disk** — утилита для разметки дисков под Vitastor OSD. С её помощью можно
создавать, удалять, менять размеры или перемещать разделы OSD.
## Overall read/write process
- Vitastor stores virtual disks, also named "images" or "inodes".
- Each image is stored in some pool. Pool specifies storage parameters such as redundancy
scheme (replication or EC — erasure codes, i.e. error correction codes), failure domain
and restrictions on OSD selection for image data placement. See [Pool configuration](../config/pool.en.md) for details.
- Each image is split into objects/blocks of fixed size, equal to [block_size](../config/layout-cluster.en.md#block_size)
(128 KB by default), multiplied by data part count for EC or 1 for replicas. That is,
if a pool uses EC 4+2 coding scheme (4 data parts + 2 parity parts), then, with the
default block_size, images are split into 512 KB objects.
- Client read/write requests are split into parts at object boundaries.
- Each object is mapped to a PG number it belongs to, by simply taking a remainder of
division of its offset by PG count of the image's pool.
- Client reads primary OSD for all PGs from etcd. Primary OSD for each PG is assigned
by the monitor during cluster operation, along with the full PG OSD set.
- If not already connected, client connects to primary OSDs of all PGs involved in a
read/write request and sends parts of the request to them.
- If a primary OSD is unavailable, client retries connection attempts indefinitely
either until it becomes available or until the monitor assigns another OSD as primary
for that PG.
- Client also retries requests if the primary OSD replies with error code EPIPE, meaning
that the PG is inactive at this OSD at the moment - for example, when the primary OSD
is switched, or if the primary OSD itself loses connection to replicas during request
handling.
- Primary OSD determines where the parts of the object are stored. By default, all objects
are assumed to be stored at the target OSD set of a PG, but some of them may be present
at a different OSD set if they are degraded or moved, or if the data rebalancing process
is active. OSDs doesn't do any network requests, if calculates locations of all objects
during PG activation and stores it in memory.
- Primary OSD handles the request locally when it can - for example, when it's a read
from a replicated pool or when it's a read from a EC pool involving only one data part
stored on the OSD's local disk.
- When a request requires reads or writes to additional OSDs, primary OSD uses already
established connections to secondary OSDs of the PG to execute these requests. This happens
in parallel to local disk operations. All such connections are guaranteed to be already
established when the PG is active, and if any of them is dropped, PG is restarted and
all current read/write operations to it fail with EPIPE error and are retried by clients.
- After completing all secondary read/write requests, primary OSD sends the response to
the client.
### Nuances of request handling
- If a pool uses erasure codes and some of the OSDs are unavailable, primary OSDs recover
data from the remaining parts during read.
- Each object has a version number. During write, primary OSD first determines the current
version of the object. As primary OSD usually stores the object or its part itself, most
of the time version is read from the memory of the OSD itself. However, if primary OSD
doesn't contain parts of the object, it requests the version number from a secondary OSD
which has that part. Such request still doesn't involve reading from the disk though,
because object metadata, including version number, is always stored in OSD memory.
- If a pool uses erasure codes, partial writes of an object require reading other parts of
it from secondary OSDs or from the local disk of the primary OSD itself. This is called
"read-modify-write" process.
- If a pool uses erasure codes, two-phase write process is used to get rid of the Write Hole
problem: first a new version of object parts is written to all secondary OSDs without
removing the previous version, and then, after receiving successful write confirmations
from all OSDs, new version is committed and the old one is allowed to be removed.
- In a pool doesn't use immediate_commit mode, then write requests sent by clients aren't
treated as committed to physical media instantly. Clients have to send separate type of
requests (SYNC) to commit changes, and before it isn't sent, new versions of data are
allowed to be lost if some OSDs die. Thus, when immediate_commit is disabled, clients
store copies of all write requests in memory and repeat them from there when the
connection to primary OSD is lost. This in-memory copy is removed after a successful
SYNC, and to prevent excessive memory usage, clients also do an automatic SYNC
every [client_dirty_limit](../config/network.en.md#client_dirty_limit) written bytes.
## Similarities to Ceph

View File

@@ -11,6 +11,7 @@
- [Серверные компоненты](#серверные-компоненты)
- [Базовые понятия](#базовые-понятия)
- [Клиентские компоненты](#клиентские-компоненты)
- [Дополнительные утилиты](#дополнительные-утилиты)
- [Общий процесс записи и чтения](#общий-процесс-записи-и-чтения)
- [Особенности обработки запросов](#особенности-обработки-запросов)
- [Схожесть с Ceph](#схожесть-с-ceph)
@@ -34,8 +35,9 @@
- **Пул (Pool)** — контейнер для данных, имеющих одну и ту же схему избыточности и правила распределения по OSD.
- **PG (Placement Group)** — "шард", единица деления пулов в кластере, которой назначается свой набор
OSD для хранения данных (копий или частей объектов).
- **Домен отказа (Failure Domain)** — группа OSD, одновременное падение которых рассматривается
как вероятное. По умолчанию это "host" (сервер).
- **Домен отказа (Failure Domain)** — группа OSD, от одновременного падения которых должен защищать
Vitastor. По умолчанию домен отказа — "host" (сервер), но вы можете установить для пула как больший
домен отказа (например, стойку серверов), так и меньший (например, отдельный диск).
- **Дерево распределения** (Placement Tree, в Ceph CRUSH Tree) — иерархическая группировка OSD
в узлы, которые далее можно использовать как домены отказа.
@@ -49,25 +51,39 @@
На базе клиентской библиотеки реализованы все остальные клиенты:
- **vitastor-cli** — утилита командной строки для управления кластером. В данный момент позволяет
просматривать общее состояние кластера и управлять образами — т.е. создавать, менять и удалять
виртуальные диски, их снимки и клоны.
- **Драйвер QEMU** — подключаемый модуль QEMU, позволяющий QEMU/KVM виртуальным машинам работать
с виртуальными дисками Vitastor напрямую из пространства пользователя с помощью клиентской
библиотеки, без необходимости отображения дисков в виде блочных устройств. Тот же драйвер
позволяет подключать диски в систему через [VDUSE](../usage/qemu.ru.md#vduse).
- **vitastor-nbd** — утилита, позволяющая монтировать образы Vitastor в виде блочных устройств
с помощью NBD (Network Block Device), на самом деле скорее работающего как "BUSE"
(Block Device In Userspace). Модуля ядра Linux для выполнения той же задачи в Vitastor нет
(по крайней мере, пока).
- **CSI драйвер** — драйвер для подключения Vitastor-образов в виде персистентных томов (PV) Kubernetes.
Работает через vitastor-nbd — образы отражаются в виде блочных устройств и монтируются
в контейнеры.
- **[vitastor-cli](../usage/cli.ru.md)** — утилита командной строки для управления кластером.
Позволяет просматривать общее состояние кластера, управлять пулами и образами — то есть
создавать, менять и удалять виртуальные диски, их снимки и клоны.
- **[Драйвер QEMU](../usage/qemu.ru.md)** — подключаемый модуль QEMU, позволяющий QEMU/KVM
виртуальным машинам работать с виртуальными дисками Vitastor напрямую из пространства пользователя
с помощью клиентской библиотеки, без необходимости подключения дисков в виде блочных устройств
Linux. Если, однако, вы хотите подключать диски в виде блочных устройств, то вы тоже можете
сделать это с помощью того же самого драйвера и [VDUSE](../usage/qemu.ru.md#vduse).
- **[vitastor-nbd](../usage/nbd.ru.md)** — утилита, позволяющая монтировать образы Vitastor
в виде блочных устройств с помощью NBD (Network Block Device), на самом деле скорее работающего
как "BUSE" (Block Device In Userspace). Модуля ядра Linux для выполнения той же задачи в
Vitastor нет (по крайней мере, пока). NBD — более старый и нерекомендуемый способ подключения
дисков — вам следует использовать VDUSE всегда, когда это возможно.
- **[CSI драйвер](../installation/kubernetes.ru.md)** — драйвер для подключения Vitastor-образов
в виде персистентных томов (PV) Kubernetes. Работает через VDUSE (если доступно) или через
NBD — образы отражаются в виде блочных устройств и монтируются в контейнеры.
- **Драйвера Proxmox, OpenStack и т.п.** — подключаемые модули для соответствующих систем,
позволяющие использовать Vitastor как хранилище в оных.
- **vitastor-nfs** — утилита, предоставляющая файловый доступ к образам в кластере Vitastor
по протоколу NFS 3.0. Предназначена для гипервизоров, не основанных на QEMU и Linux, но при
этом поддерживающих NFS.
- **[vitastor-nfs](../usage/nfs.ru.md)** — NFS 3.0 сервер, предоставляющий два варианта файловой системы:
первая — упрощённая для файлового доступа к блочным образам (для не-QEMU гипервизоров, поддерживающих NFS),
вторая — VitastorFS, полноценная кластерная POSIX ФС. Оба варианта поддерживают параллельный
доступ с нескольких vitastor-nfs серверов. На самом деле можно вообще не выделять
отдельные NFS-серверы, а вместо этого использовать команду vitastor-nfs mount, запускающую
NFS-сервер прямо на клиентской машине и монтирующую ФС локально.
- **[Драйвер fio](../usage/fio.ru.md)** — подключаемый модуль для утилиты тестирования
производительности дисков fio, позволяющий тестировать Vitastor-кластеры.
- **vitastor-kv** — клиент для key-value базы данных, работающей поверх разделяемого блочного
образа (обычного блочного образа vitastor). Метаданные VitastorFS хранятся именно в vitastor-kv.
## Дополнительные утилиты
- **vitastor-disk** — утилита для разметки дисков под Vitastor OSD. С её помощью можно
создавать, удалять, менять размеры или перемещать разделы OSD.
## Общий процесс записи и чтения
@@ -98,16 +114,22 @@
находиться на других OSD, если эти объекты деградированы или перемещены, или идёт процесс
ребаланса. Запросы для проверки по сети не отправляются, информация о местоположении всех
объектов рассчитывается первичным OSD при активации PG и хранится в памяти.
- Первичный OSD соединяется (если ещё не соединён) с вторичными OSD, на которых располагаются
части объекта, и отправляет им запросы чтения/записи, а также читает/пишет из/в своё локальное
хранилище, если сам входит в набор.
- Когда это возможно, первичный OSD обрабатывает запрос локально. Например, так происходит
при чтениях объектов из пулов с репликацией или при чтении из EC пула, затрагивающего
только часть, хранимую на диске самого первичного OSD.
- Когда запрос требует записи или чтения с вторичных OSD, первичный OSD использует заранее
установленные соединения с ними для выполнения этих запросов. Это происходит параллельно
локальным операциям чтения/записи с диска самого OSD. Так как соединения к вторичным OSD PG
устанавливаются при её запуске, то они уже гарантированно установлены, когда PG активна,
и если любое из этих соединений отключается, PG перезапускается, а все текущие запросы чтения
и записи в неё завершаются с ошибкой EPIPE, после чего повторяются клиентами.
- После завершения всех вторичных операций чтения/записи первичный OSD отправляет ответ клиенту.
### Особенности обработки запросов
- Если в пуле используются коды коррекции ошибок и при этом часть OSD недоступна, первичный
OSD при чтении восстанавливает данные из оставшихся частей.
- Каждый объект имеет номер версии. При записи объекта первичный OSD сначала читает из номер
- Каждый объект имеет номер версии. При записи объекта первичный OSD сначала получает номер
версии объекта. Так как первичный OSD обычно сам хранит копию или часть объекта, номер
версии обычно читается из памяти самого OSD. Однако, если ни одна часть обновляемого объекта
не находится на первичном OSD, для получения номера версии он обращается к одному из вторичных
@@ -115,20 +137,20 @@
так как метаданные объектов, включая номер версии, все OSD хранят в памяти.
- Если в пуле используются коды коррекции ошибок, перед частичной записью объекта для вычисления
чётности зачастую требуется чтение частей объекта с вторичных OSD или с локального диска
самого первичного OSD.
- Также, если в пуле используются коды коррекции ошибок, для закрытия Write Hole применяется
самого первичного OSD. Это называется процессом "чтение-модификация-запись" (read-modify-write).
- Если в пуле используются коды коррекции ошибок, для закрытия Write Hole применяется
двухфазный алгоритм записи: сначала на все вторичные OSD записывается новая версия частей
объекта, но при этом старая версия не удаляется, а потом, после получения подтверждения
успешной записи от всех вторичных OSD, новая версия фиксируется и разрешается удаление старой.
- Если в кластере не включён режим immediate_commit, то запросы записи, отправляемые клиентами,
- Если в пуле не включён режим immediate_commit, то запросы записи, отправляемые клиентами,
не считаются зафиксированными на физических накопителях сразу. Для фиксации данных клиенты
должны отдельно отправлять запросы SYNC (отдельный от чтения и записи вид запроса),
а пока такой запрос не отправлен, считается, что записанные данные могут исчезнуть,
если соответствующий OSD упадёт. Поэтому, когда режим immediate_commit отключён, все
запросы записи клиенты копируют в памяти и при потере соединения и повторном соединении
с OSD повторяют из памяти. Скопированные в память данные удаляются при успешном fsync,
с OSD повторяют из памяти. Скопированные в память данные удаляются при успешном SYNC,
а чтобы хранение этих данных не приводило к чрезмерному потреблению памяти, клиенты
автоматически выполняют fsync каждые [client_dirty_limit](../config/network.ru.md#client_dirty_limit)
автоматически выполняют SYNC каждые [client_dirty_limit](../config/network.ru.md#client_dirty_limit)
записанных байт.
## Схожесть с Ceph

View File

@@ -32,7 +32,7 @@
- SATA SSD: Micron 5100/5200/5300/5400, Samsung PM863/PM883/PM893, Intel D3-S4510/4520/4610/4620, Kingston DC500M
- NVMe: Micron 9100/9200/9300/9400, Micron 7300/7450, Samsung PM983/PM9A3, Samsung PM1723/1735/1743,
Intel DC-P3700/P4500/P4600, Intel D7-P5500/P5600, Intel Optane, Kingston DC1000B/DC1500M
Intel DC-P3700/P4500/P4600, Intel D5-P4320, Intel D7-P5500/P5600, Intel Optane, Kingston DC1000B/DC1500M
- HDD: HGST Ultrastar, Toshiba MG, Seagate EXOS
## Configure monitors

View File

@@ -32,7 +32,7 @@
- SATA SSD: Micron 5100/5200/5300/5400, Samsung PM863/PM883/PM893, Intel D3-S4510/4520/4610/4620, Kingston DC500M
- NVMe: Micron 9100/9200/9300/9400, Micron 7300/7450, Samsung PM983/PM9A3, Samsung PM1723/1735/1743,
Intel DC-P3700/P4500/P4600, Intel D7-P5500/P5600, Intel Optane, Kingston DC1000B/DC1500M
Intel DC-P3700/P4500/P4600, Intel D5-P4320, Intel D7-P5500/P5600, Intel Optane, Kingston DC1000B/DC1500M
- HDD: HGST Ultrastar, Toshiba MG, Seagate EXOS
## Настройте мониторы

View File

@@ -51,12 +51,16 @@ Options (automatic mode):
--osd_per_disk <N>
Create <N> OSDs on each disk (default 1)
--hybrid
Prepare hybrid (HDD+SSD) OSDs using provided devices. SSDs will be used for
journals and metadata, HDDs will be used for data. Partitions for journals and
metadata will be created automatically. Whether disks are SSD or HDD is decided
by the `/sys/block/.../queue/rotational` flag. In hybrid mode, default object
size is 1 MB instead of 128 KB, default journal size is 1 GB instead of 32 MB,
and throttle_small_writes is enabled by default.
Prepare hybrid (HDD+SSD, NVMe+SATA or etc) OSDs using provided devices. By default,
any passed SSDs will be used for journals and metadata, HDDs will be used for data,
but you can override this behaviour with --fast-devices option. Journal and metadata
partitions will be created automatically. In the default mode, SSD and HDD disks
are distinguished by the `/sys/block/.../queue/rotational` flag. When HDDs are used
for data in hybrid mode, default block_size is 1 MB instead of 128 KB, default journal
size is 1 GB instead of 32 MB, and throttle_small_writes is enabled by default.
--fast-devices /dev/nvmeX,/dev/nvmeY
In --hybrid mode, use these devices for journal and metadata instead of auto-detecting
and extracting them from the main [devices...] list.
--disable_data_fsync auto
Disable data device cache and fsync (1/yes/true = on, default auto)
--disable_meta_fsync auto

View File

@@ -51,12 +51,17 @@ vitastor-disk - инструмент командной строки для уп
--osd_per_disk <N>
Создавать по несколько (<N>) OSD на каждом диске (по умолчанию 1)
--hybrid
Инициализировать гибридные (HDD+SSD) OSD на указанных дисках. SSD будут
использованы для журналов и метаданных, а HDD - для данных. Разделы для журналов
и метаданных будут созданы автоматически. Является ли диск SSD или HDD, определяется
по флагу `/sys/block/.../queue/rotational`. В гибридном режиме по умолчанию
используется размер объекта 1 МБ вместо 128 КБ, размер журнала 1 ГБ вместо 32 МБ
и включённый throttle_small_writes.
Инициализировать гибридные (HDD+SSD, NVMe+SATA и т.п.) OSD на указанных дисках.
По умолчанию, SSD будут использованы для журналов и метаданных, а HDD - для данных,
но вы можете поменять это поведение опцией --fast-devices. Разделы для журналов
и метаданных будут созданы автоматически. В режиме по умолчанию SSD и HDD-диски
различаются по флагу `/sys/block/.../queue/rotational`. Когда в гибридном режиме
для данных используются HDD, по умолчанию размер блока устанавливается 1 МБ вместо
128 КБ, размер журнала 1 ГБ вместо 32 МБ, и throttle_small_writes включается по
умолчанию.
--fast-devices /dev/nvmeX,/dev/nvmeY
Использовать данные диски для журналов и метаданных в гибридном режиме вместо их
автоопределения и извлечения из основного списка [devices...].
--disable_data_fsync auto
Отключать кэш и fsync-и для устройств данных. (1/yes/true = да, по умолчанию автоопределение)
--disable_meta_fsync auto

View File

@@ -567,6 +567,7 @@ class Mon
async apply_pool_pgs(results, up_osds, osd_tree, tree_hash)
{
const etcd_request = { compare: [], success: [] };
for (const pool_id in (this.state.pg.config||{}).items||{})
{
// We should stop all PGs when deleting a pool or changing its PG count
@@ -579,9 +580,24 @@ class Mon
return false;
}
}
if (!this.state.config.pools[pool_id])
{
// Delete PG history and stats of the deleted pool
etcd_request.success.push({ requestDeleteRange: {
key: b64(this.config.etcd_prefix+'/pg/history/'+pool_id+'/'),
range_end: b64(this.config.etcd_prefix+'/pg/history/'+pool_id+'0'),
} });
etcd_request.success.push({ requestDeleteRange: {
key: b64(this.config.etcd_prefix+'/pg/stats/'+pool_id+'/'),
range_end: b64(this.config.etcd_prefix+'/pg/stats/'+pool_id+'0'),
} });
etcd_request.success.push({ requestDeleteRange: {
key: b64(this.config.etcd_prefix+'/pgstats/'+pool_id+'/'),
range_end: b64(this.config.etcd_prefix+'/pgstats/'+pool_id+'0'),
} });
}
}
const new_pg_config = JSON.parse(JSON.stringify(this.state.pg.config));
const etcd_request = { compare: [], success: [] };
for (const pool_id in (new_pg_config||{}).items||{})
{
if (!this.state.config.pools[pool_id])

View File

@@ -1,6 +1,6 @@
{
"name": "vitastor-mon",
"version": "1.9.2",
"version": "1.9.3",
"description": "Vitastor SDS monitor service",
"main": "mon-main.js",
"scripts": {

View File

@@ -50,7 +50,7 @@ from cinder.volume import configuration
from cinder.volume import driver
from cinder.volume import volume_utils
VITASTOR_VERSION = '1.9.2'
VITASTOR_VERSION = '1.9.3'
LOG = logging.getLogger(__name__)

View File

@@ -1,11 +1,11 @@
Name: vitastor
Version: 1.9.2
Version: 1.9.3
Release: 1%{?dist}
Summary: Vitastor, a fast software-defined clustered block storage
License: Vitastor Network Public License 1.1
URL: https://vitastor.io/
Source0: vitastor-1.9.2.el7.tar.gz
Source0: vitastor-1.9.3.el7.tar.gz
BuildRequires: liburing-devel >= 0.6
BuildRequires: gperftools-devel

View File

@@ -1,11 +1,11 @@
Name: vitastor
Version: 1.9.2
Version: 1.9.3
Release: 1%{?dist}
Summary: Vitastor, a fast software-defined clustered block storage
License: Vitastor Network Public License 1.1
URL: https://vitastor.io/
Source0: vitastor-1.9.2.el8.tar.gz
Source0: vitastor-1.9.3.el8.tar.gz
BuildRequires: liburing-devel >= 0.6
BuildRequires: gperftools-devel

View File

@@ -1,11 +1,11 @@
Name: vitastor
Version: 1.9.2
Version: 1.9.3
Release: 1%{?dist}
Summary: Vitastor, a fast software-defined clustered block storage
License: Vitastor Network Public License 1.1
URL: https://vitastor.io/
Source0: vitastor-1.9.2.el9.tar.gz
Source0: vitastor-1.9.3.el9.tar.gz
BuildRequires: liburing-devel >= 0.6
BuildRequires: gperftools-devel
@@ -74,7 +74,7 @@ Vitastor library headers for development.
Summary: Vitastor - fio drivers
Group: Development/Libraries
Requires: vitastor-client = %{version}-%{release}
Requires: fio = 3.27-8.el9
Requires: fio = 3.35-1.el9
%description -n vitastor-fio

View File

@@ -19,7 +19,7 @@ if("${CMAKE_INSTALL_PREFIX}" MATCHES "^/usr/local/?$")
set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}")
endif()
add_definitions(-DVITASTOR_VERSION="1.9.2")
add_definitions(-DVITASTOR_VERSION="1.9.3")
add_definitions(-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -Wall -Wno-sign-compare -Wno-comment -Wno-parentheses -Wno-pointer-arith -fdiagnostics-color=always -fno-omit-frame-pointer -I ${CMAKE_SOURCE_DIR}/src)
add_link_options(-fno-omit-frame-pointer)
if (${WITH_ASAN})

View File

@@ -6,7 +6,7 @@ includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@
Name: Vitastor
Description: Vitastor client library
Version: 1.9.2
Version: 1.9.3
Libs: -L${libdir} -lvitastor_client
Cflags: -I${includedir}

View File

@@ -431,7 +431,7 @@ struct cli_dd_t
if (read_op->retval < 0)
{
fprintf(
stderr, "Failed to read bitmap for %lu bytes from image %s at offset %lu: %s (code %d)\n",
stderr, "Failed to read bitmap for %ju bytes from image %s at offset %ju: %s (code %d)\n",
read_op->len, iinfo.iimg.c_str(), read_op->offset,
strerror(read_op->retval < 0 ? -read_op->retval : EIO), read_op->retval
);
@@ -476,7 +476,7 @@ struct cli_dd_t
if (read_op->retval != read_op->len)
{
fprintf(
stderr, "Failed to read %lu bytes from image %s at offset %lu: %s (code %d)\n",
stderr, "Failed to read %ju bytes from image %s at offset %ju: %s (code %d)\n",
read_op->len, iinfo.iimg.c_str(), read_op->offset,
strerror(read_op->retval < 0 ? -read_op->retval : EIO), read_op->retval
);
@@ -547,7 +547,7 @@ struct cli_dd_t
if (data->res < 0)
{
fprintf(
stderr, "Failed to read %lu bytes from %s at offset %lu: %s (code %d)\n",
stderr, "Failed to read %ju bytes from %s at offset %ju: %s (code %d)\n",
data->iov.iov_len, iinfo.ifile == "" ? "stdin" : iinfo.ifile.c_str(), cur_read->offset,
strerror(-data->res), data->res
);
@@ -644,7 +644,7 @@ struct cli_dd_t
if (write_op->retval != write_op->len)
{
fprintf(
stderr, "Failed to write %lu bytes to image %s at offset %lu: %s (code %d)\n",
stderr, "Failed to write %ju bytes to image %s at offset %ju: %s (code %d)\n",
write_op->len, oinfo.oimg.c_str(), write_op->offset,
strerror(write_op->retval < 0 ? -write_op->retval : EIO), write_op->retval
);
@@ -680,7 +680,7 @@ struct cli_dd_t
if (data->res < 0)
{
fprintf(
stderr, "Failed to write %lu bytes to %s at offset %lu: %s (code %d)\n",
stderr, "Failed to write %ju bytes to %s at offset %ju: %s (code %d)\n",
data->iov.iov_len, oinfo.ofile == "" ? "stdout" : oinfo.ofile.c_str(),
oinfo.out_seekable ? cur_read->offset+cur_read->len+oseek : 0,
strerror(-data->res), data->res
@@ -727,7 +727,7 @@ struct cli_dd_t
{
char buf[256];
snprintf(
buf, sizeof(buf), "%lu bytes (%s) copied, %.1f s, %sB/s",
buf, sizeof(buf), "%ju bytes (%s) copied, %.1f s, %sB/s",
written_size, format_size(written_size).c_str(), sec_total,
format_size((uint64_t)(written_size/sec_total), true).c_str()
);
@@ -749,7 +749,7 @@ struct cli_dd_t
else
{
fprintf(
stderr, "\r%lu bytes (%s) copied, %.1f s, %sB/s, avg %sB/s\033[K",
stderr, "\r%ju bytes (%s) copied, %.1f s, %sB/s, avg %sB/s\033[K",
written_size, format_size(written_size).c_str(), sec_total,
format_size((uint64_t)(delta/sec_delta), true).c_str(),
format_size((uint64_t)(written_size/sec_total), true).c_str()

View File

@@ -27,12 +27,16 @@ static const char *help_text =
" --osd_per_disk <N>\n"
" Create <N> OSDs on each disk (default 1)\n"
" --hybrid\n"
" Prepare hybrid (HDD+SSD) OSDs using provided devices. SSDs will be used for\n"
" journals and metadata, HDDs will be used for data. Partitions for journals and\n"
" metadata will be created automatically. Whether disks are SSD or HDD is decided\n"
" by the `/sys/block/.../queue/rotational` flag. In hybrid mode, default object\n"
" size is 1 MB instead of 128 KB, default journal size is 1 GB instead of 32 MB,\n"
" and throttle_small_writes is enabled by default.\n"
" Prepare hybrid (HDD+SSD, NVMe+SATA or etc) OSDs using provided devices. By default,\n"
" any passed SSDs will be used for journals and metadata, HDDs will be used for data,\n"
" but you can override this behaviour with --fast-devices option. Journal and metadata\n"
" partitions will be created automatically. In the default mode, SSD and HDD disks\n"
" are distinguished by the `/sys/block/.../queue/rotational` flag. When HDDs are used\n"
" for data in hybrid mode, default block_size is 1 MB instead of 128 KB, default journal\n"
" size is 1 GB instead of 32 MB, and throttle_small_writes is enabled by default.\n"
" --fast-devices /dev/nvmeX,/dev/nvmeY\n"
" In --hybrid mode, use these devices for journal and metadata instead of auto-detecting\n"
" and extracting them from the main [devices...] list.\n"
" --disable_data_fsync auto\n"
" Disable data device cache and fsync (1/yes/true = on, default auto)\n"
" --disable_meta_fsync auto\n"
@@ -196,6 +200,7 @@ static const char *help_text =
" --device_size 0 Set device size\n"
" --format text Result format: json, options, env, or text\n"
"\n"
"Default I/O mode for commands involving disk I/O is O_DIRECT. If you don't want it, add --io cached.\n"
"Use vitastor-disk --help <command> for command details or vitastor-disk --help --all for all details.\n"
;
@@ -220,6 +225,10 @@ int main(int argc, char *argv[])
cmd.push_back((char*)"dump-journal");
aliased = true;
}
else if (!strcmp(exe_name, "vitastor-disk-test"))
{
self.test_mode = true;
}
for (int i = 1; i < argc; i++)
{
if (!strcmp(argv[i], "--all"))
@@ -314,6 +323,7 @@ int main(int argc, char *argv[])
// First argument is an OSD device - take metadata layout parameters from it
if (self.dump_load_check_superblock(self.new_journal_device))
return 1;
self.new_journal_device = self.dsk.journal_device;
self.new_journal_offset = self.dsk.journal_offset;
self.new_journal_len = self.dsk.journal_len;
}
@@ -379,6 +389,7 @@ int main(int argc, char *argv[])
// First argument is an OSD device - take metadata layout parameters from it
if (self.dump_load_check_superblock(self.new_meta_device))
return 1;
self.new_meta_device = self.dsk.meta_device;
self.new_meta_offset = self.dsk.meta_offset;
self.new_meta_len = self.dsk.meta_len;
}

View File

@@ -22,6 +22,7 @@
#define VITASTOR_DISK_MAX_SB_SIZE 128*1024
#define VITASTOR_PART_TYPE "e7009fac-a5a1-4d72-af72-53de13059903"
#define DEFAULT_HYBRID_JOURNAL "1G"
#define DEFAULT_HYBRID_SSD_JOURNAL "128M"
struct resizer_data_moving_t;
@@ -40,6 +41,7 @@ struct disk_tool_t
/**** Parameters ****/
std::map<std::string, std::string> options;
bool test_mode = false;
bool all, json, now;
bool dump_with_blocks, dump_with_data;
blockstore_disk_t dsk;
@@ -126,7 +128,8 @@ struct disk_tool_t
uint32_t write_osd_superblock(std::string device, json11::Json params);
int prepare_one(std::map<std::string, std::string> options, int is_hdd = -1);
int check_existing_partition(const std::string & dev);
int check_existing_partition(std::string & dev_by_uuid);
int fix_partition_type(std::string & dev_by_uuid);
int prepare(std::vector<std::string> devices);
std::vector<vitastor_dev_info_t> collect_devices(const std::vector<std::string> & devices);
json11::Json add_partitions(vitastor_dev_info_t & devinfo, std::vector<std::string> sizes);
@@ -148,6 +151,6 @@ int write_zero(int fd, uint64_t offset, uint64_t size);
json11::Json read_parttable(std::string dev);
uint64_t dev_size_from_parttable(json11::Json pt);
uint64_t free_from_parttable(json11::Json pt);
int fix_partition_type(std::string dev_by_uuid);
int fix_partition_type_uuid(std::string & dev_by_uuid, const std::string & type_uuid);
std::string csum_type_str(uint32_t data_csum_type);
uint32_t csum_type_from_str(std::string data_csum_type);

View File

@@ -18,7 +18,7 @@ int disk_tool_t::dump_journal()
printf("[\n");
if (all)
{
dsk.journal_fd = open(dsk.journal_device.c_str(), O_DIRECT|O_RDONLY);
dsk.journal_fd = open(dsk.journal_device.c_str(), (options["io"] == "cached" ? 0 : O_DIRECT) | O_RDONLY);
if (dsk.journal_fd < 0)
{
fprintf(stderr, "Failed to open journal device %s: %s\n", dsk.journal_device.c_str(), strerror(errno));
@@ -121,7 +121,7 @@ int disk_tool_t::dump_journal()
int disk_tool_t::process_journal(std::function<int(void*)> block_fn)
{
dsk.journal_fd = open(dsk.journal_device.c_str(), O_DIRECT|O_RDONLY);
dsk.journal_fd = open(dsk.journal_device.c_str(), (options["io"] == "cached" ? 0 : O_DIRECT) | O_RDONLY);
if (dsk.journal_fd < 0)
{
fprintf(stderr, "Failed to open journal device %s: %s\n", dsk.journal_device.c_str(), strerror(errno));

View File

@@ -14,7 +14,7 @@ int disk_tool_t::process_meta(std::function<void(blockstore_meta_header_v2_t *)>
fprintf(stderr, "Invalid metadata block size: is not a multiple of %d\n", DIRECT_IO_ALIGNMENT);
return 1;
}
dsk.meta_fd = open(dsk.meta_device.c_str(), O_DIRECT|O_RDONLY);
dsk.meta_fd = open(dsk.meta_device.c_str(), (options["io"] == "cached" ? 0 : O_DIRECT) | O_RDONLY);
if (dsk.meta_fd < 0)
{
fprintf(stderr, "Failed to open metadata device %s: %s\n", dsk.meta_device.c_str(), strerror(errno));
@@ -159,7 +159,7 @@ int disk_tool_t::dump_load_check_superblock(const std::string & device)
{
auto cfg = json_to_string_map(sb["params"].object_items());
dsk.parse_config(cfg);
dsk.data_io = dsk.meta_io = dsk.journal_io = "direct";
dsk.data_io = dsk.meta_io = dsk.journal_io = "cached";
dsk.open_data();
dsk.open_meta();
dsk.open_journal();
@@ -315,8 +315,7 @@ int disk_tool_t::write_json_meta(json11::Json meta)
fromhexstr(e["data_csum"].string_value(), new_data_csum_size,
((uint8_t*)new_entry) + sizeof(clean_disk_entry) + 2*new_clean_entry_bitmap_size);
}
uint32_t *new_entry_csum = (uint32_t*)(((uint8_t*)new_entry) + sizeof(clean_disk_entry) +
2*new_clean_entry_bitmap_size + new_data_csum_size);
uint32_t *new_entry_csum = (uint32_t*)(((uint8_t*)new_entry) + new_clean_entry_size - 4);
*new_entry_csum = crc32c(0, new_entry, new_clean_entry_size - 4);
}
}

View File

@@ -29,18 +29,12 @@ int disk_tool_t::prepare_one(std::map<std::string, std::string> options, int is_
};
if (options.find("force") == options.end())
{
std::vector<std::string> all_devs = { options["data_device"], options["meta_device"], options["journal_device"] };
for (int i = 0; i < all_devs.size(); i++)
std::string* all_devs[] = { &options["data_device"], &options["meta_device"], &options["journal_device"] };
for (int i = 0; i < 3; i++)
{
const auto & dev = all_devs[i];
auto & dev = *all_devs[i];
if (dev == "")
continue;
if (dev.substr(0, 22) != "/dev/disk/by-partuuid/")
{
// Partitions should be identified by GPT partition UUID
fprintf(stderr, "%s does not start with /dev/disk/by-partuuid/. Partitions should be identified by GPT partition UUIDs\n", dev.c_str());
return 1;
}
std::string real_dev = realpath_str(dev, false);
if (real_dev == "")
return 1;
@@ -114,7 +108,7 @@ int disk_tool_t::prepare_one(std::map<std::string, std::string> options, int is_
try
{
dsk.parse_config(options);
dsk.data_io = dsk.meta_io = dsk.journal_io = "direct";
dsk.data_io = dsk.meta_io = dsk.journal_io = (options["io"] == "cached" ? "cached" : "direct");
dsk.open_data();
dsk.open_meta();
dsk.open_journal();
@@ -159,7 +153,11 @@ int disk_tool_t::prepare_one(std::map<std::string, std::string> options, int is_
return 1;
}
std::string osd_num_str;
if (shell_exec({ "vitastor-cli", "alloc-osd" }, "", &osd_num_str, NULL) != 0)
if (test_mode && options.find("osd_num") != options.end())
{
osd_num_str = options["osd_num"];
}
else if (shell_exec({ "vitastor-cli", "alloc-osd" }, "", &osd_num_str, NULL) != 0)
{
dsk.close_all();
return 1;
@@ -199,15 +197,18 @@ int disk_tool_t::prepare_one(std::map<std::string, std::string> options, int is_
if (sep_j)
desc += (sep_m ? " and journal on " : " with journal on ") + realpath_str(options["journal_device"]);
fprintf(stderr, "Initialized OSD %ju on %s\n", osd_num, desc.c_str());
if (shell_exec({ "systemctl", "enable", "--now", "vitastor-osd@"+std::to_string(osd_num) }, "", NULL, NULL) != 0)
if (!test_mode || options.find("no_init") == options.end())
{
fprintf(stderr, "Failed to enable systemd unit vitastor-osd@%ju\n", osd_num);
return 1;
if (shell_exec({ "systemctl", "enable", "--now", "vitastor-osd@"+std::to_string(osd_num) }, "", NULL, NULL) != 0)
{
fprintf(stderr, "Failed to enable systemd unit vitastor-osd@%ju\n", osd_num);
return 1;
}
}
return 0;
}
int disk_tool_t::check_existing_partition(const std::string & dev)
int disk_tool_t::check_existing_partition(std::string & dev)
{
std::string out;
if (shell_exec({ "wipefs", dev }, "", &out, NULL) != 0 || out != "")
@@ -229,11 +230,27 @@ int disk_tool_t::check_existing_partition(const std::string & dev)
return 0;
}
int disk_tool_t::fix_partition_type(std::string & dev)
{
std::string type_uuid = VITASTOR_PART_TYPE;
if (test_mode && options.find("part_type_uuid") != options.end())
{
type_uuid = options["part_type_uuid"];
}
return fix_partition_type_uuid(dev, type_uuid);
}
std::vector<vitastor_dev_info_t> disk_tool_t::collect_devices(const std::vector<std::string> & devices)
{
std::vector<vitastor_dev_info_t> devinfo;
std::set<std::string> seen;
for (auto & dev: devices)
{
if (seen.find(dev) != seen.end())
{
fprintf(stderr, "%s is specified multiple times, ignoring\n", dev.c_str());
continue;
}
// Check if the device is a whole disk
if (dev.substr(0, 5) != "/dev/")
{
@@ -294,10 +311,6 @@ std::vector<vitastor_dev_info_t> disk_tool_t::collect_devices(const std::vector<
.free = !pt.is_null() ? free_from_parttable(pt) : dev_size,
});
}
if (!devinfo.size())
{
fprintf(stderr, "No suitable devices found\n");
}
return devinfo;
}
@@ -348,47 +361,12 @@ json11::Json disk_tool_t::add_partitions(vitastor_dev_info_t & devinfo, std::vec
fprintf(stderr, "Failed to add %zu partition(s) with sfdisk: new partitions not found in table\n", sizes.size());
return {};
}
// Check if new nodes exist and run partprobe if not
// Check if new devices exist, run partprobe if not, then wait until they appear
// FIXME: We could use parted instead of sfdisk because partprobe is already a part of parted
int iter = 0, r;
while (true)
{
for (const auto & part: new_parts)
{
std::string link_path = "/dev/disk/by-partuuid/"+strtolower(part["uuid"].string_value());
struct stat st;
if (lstat(link_path.c_str(), &st) < 0)
{
if (errno == ENOENT)
{
iter++;
// Run partprobe
std::string out;
if (iter > 1 || (r = shell_exec({ "partprobe", devinfo.path }, "", &out, NULL)) != 0)
{
fprintf(
stderr, iter == 1 && r == 255
? "partprobe utility is required to reread partition table while disk %s is in use\n"
: "partprobe failed to re-read partition table while disk %s is in use\n",
devinfo.path.c_str()
);
return {};
}
break;
}
else
{
fprintf(stderr, "Failed to lstat %s: %s\n", link_path.c_str(), strerror(errno));
return {};
}
}
}
break;
}
// Wait until device symlinks in /dev/disk/by-partuuid/ appear
bool exists = false;
const int max_iter = 300; // max 30 sec
iter = 0;
int iter = 0;
int r = 0;
while (!exists && iter < max_iter)
{
exists = true;
@@ -396,28 +374,48 @@ json11::Json disk_tool_t::add_partitions(vitastor_dev_info_t & devinfo, std::vec
{
std::string link_path = "/dev/disk/by-partuuid/"+strtolower(part["uuid"].string_value());
struct stat st;
if (lstat(link_path.c_str(), &st) < 0)
if (stat(part["node"].string_value().c_str(), &st) < 0 ||
lstat(link_path.c_str(), &st) < 0)
{
if (errno == ENOENT)
{
exists = false;
if (iter == 4)
{
// Print message after 400ms
fprintf(stderr, "Waiting for %s to appear for up to %d sec...\n", link_path.c_str(), max_iter/10);
}
}
else
{
fprintf(stderr, "Failed to lstat %s: %s\n", link_path.c_str(), strerror(errno));
fprintf(stderr, "Failed to stat %s or lstat %s: %s\n", part["node"].string_value().c_str(),
link_path.c_str(), strerror(errno));
return {};
}
}
}
if (!exists)
if (exists)
{
struct timespec ts = { .tv_sec = 0, .tv_nsec = 100000000 }; // 100ms
iter += (nanosleep(&ts, NULL) == 0);
break;
}
if (!exists && iter == 0)
{
// Run partprobe
std::string out;
r = shell_exec({ "partprobe", devinfo.path }, "", &out, NULL);
if (r != 0)
{
fprintf(
stderr, r == 255
? "partprobe utility is required to reread partition table while disk %s is in use\n"
: "partprobe failed to re-read partition table while disk %s is in use\n",
devinfo.path.c_str()
);
return {};
}
}
struct timespec ts = { .tv_sec = 0, .tv_nsec = 100000000 }; // 100ms
iter += (nanosleep(&ts, NULL) == 0 || !iter);
}
devinfo.pt = newpt;
devinfo.osd_part_count += sizes.size();
@@ -500,7 +498,7 @@ int disk_tool_t::get_meta_partition(std::vector<vitastor_dev_info_t> & ssds, std
{
blockstore_disk_t dsk;
dsk.parse_config(options);
dsk.data_io = dsk.meta_io = dsk.journal_io = "direct";
dsk.data_io = dsk.meta_io = dsk.journal_io = "cached";
dsk.open_data();
dsk.open_meta();
dsk.open_journal();
@@ -510,6 +508,7 @@ int disk_tool_t::get_meta_partition(std::vector<vitastor_dev_info_t> & ssds, std
}
catch (std::exception & e)
{
dsk.close_all();
fprintf(stderr, "%s\n", e.what());
return 1;
}
@@ -564,9 +563,12 @@ int disk_tool_t::prepare(std::vector<std::string> devices)
{
if (options.find("data_device") != options.end() && options["data_device"] != "")
{
if (options.find("hybrid") != options.end() || options.find("osd_per_disk") != options.end() || devices.size())
if (options.find("hybrid") != options.end() ||
options.find("fast_devices") != options.end() ||
options.find("osd_per_disk") != options.end() ||
devices.size())
{
fprintf(stderr, "Device list (positional arguments) and --hybrid are incompatible with --data_device\n");
fprintf(stderr, "Device list (positional arguments), --osd_per_disk, --hybrid and --fast-devices are incompatible with --data_device\n");
return 1;
}
return prepare_one(options, options.find("hdd") != options.end() ? 1 : 0);
@@ -583,8 +585,10 @@ int disk_tool_t::prepare(std::vector<std::string> devices)
auto devinfo = collect_devices(devices);
if (!devinfo.size())
{
fprintf(stderr, "No suitable devices found\n");
return 1;
}
bool explicit_fast = options.find("fast_devices") != options.end();
uint64_t osd_per_disk = stoull_full(options["osd_per_disk"]);
if (!osd_per_disk)
osd_per_disk = 1;
@@ -603,21 +607,55 @@ int disk_tool_t::prepare(std::vector<std::string> devices)
if (options.find("disable_meta_fsync") == options.end())
options["disable_meta_fsync"] = "auto";
options["disable_journal_fsync"] = options["disable_meta_fsync"];
for (auto & dev: devinfo)
if (!dev.is_hdd)
ssds.push_back(dev);
if (!ssds.size())
if (explicit_fast)
{
fprintf(stderr, "No SSDs found\n");
return 1;
auto fast = explode(",", options["fast_devices"], true);
ssds = collect_devices(fast);
if (!ssds.size())
{
fprintf(stderr, "No fast devices found\n");
return 1;
}
if (options["journal_size"] == "")
{
auto auto_journal_size = DEFAULT_HYBRID_SSD_JOURNAL;
for (auto & dev: devinfo)
{
if (dev.is_hdd)
{
auto_journal_size = DEFAULT_HYBRID_JOURNAL;
break;
}
}
options["journal_size"] = auto_journal_size;
}
}
else if (ssds.size() == devinfo.size())
else
{
fprintf(stderr, "No HDDs found\n");
return 1;
std::vector<vitastor_dev_info_t> hdds;
for (auto & dev: devinfo)
{
if (!dev.is_hdd)
ssds.push_back(dev);
else
hdds.push_back(dev);
}
if (!ssds.size())
{
fprintf(stderr, "No SSDs found\n");
return 1;
}
if (!hdds.size())
{
fprintf(stderr, "No HDDs found\n");
return 1;
}
devinfo = hdds;
if (options["journal_size"] == "")
{
options["journal_size"] = DEFAULT_HYBRID_JOURNAL;
}
}
if (options["journal_size"] == "")
options["journal_size"] = DEFAULT_HYBRID_JOURNAL;
}
else
{
@@ -627,31 +665,28 @@ int disk_tool_t::prepare(std::vector<std::string> devices)
auto journal_size = options["journal_size"];
for (auto & dev: devinfo)
{
if (!hybrid || dev.is_hdd)
// Select new partitions and create an OSD on each of them
for (const auto & uuid: get_new_data_parts(dev, osd_per_disk, max_other_percent))
{
// Select new partitions and create an OSD on each of them
for (const auto & uuid: get_new_data_parts(dev, osd_per_disk, max_other_percent))
options["force"] = true;
options["data_device"] = "/dev/disk/by-partuuid/"+strtolower(uuid);
if (hybrid)
{
options["force"] = true;
options["data_device"] = "/dev/disk/by-partuuid/"+strtolower(uuid);
if (hybrid)
// Select/create journal and metadata partitions
int r = get_meta_partition(ssds, options);
if (r != 0)
{
// Select/create journal and metadata partitions
int r = get_meta_partition(ssds, options);
if (r != 0)
{
return 1;
}
options.erase("journal_size");
}
// Treat all disks as SSDs if not in the hybrid mode
prepare_one(options, dev.is_hdd ? 1 : 0);
if (hybrid)
{
options["journal_size"] = journal_size;
options.erase("journal_device");
options.erase("meta_device");
return 1;
}
options.erase("journal_size");
}
// Treat all disks as SSDs if not in the hybrid mode
prepare_one(options, dev.is_hdd ? 1 : 0);
if (hybrid)
{
options["journal_size"] = journal_size;
options.erase("journal_device");
options.erase("meta_device");
}
}
}

View File

@@ -91,7 +91,7 @@ int disk_tool_t::resize_parse_params()
try
{
dsk.parse_config(options);
dsk.data_io = dsk.meta_io = dsk.journal_io = "direct";
dsk.data_io = dsk.meta_io = dsk.journal_io = "cached";
dsk.open_data();
dsk.open_meta();
dsk.open_journal();
@@ -114,7 +114,10 @@ int disk_tool_t::resize_parse_params()
new_data_offset = options.find("new_data_offset") != options.end()
? parse_size(options["new_data_offset"]) : dsk.data_offset;
new_data_len = options.find("new_data_len") != options.end()
? parse_size(options["new_data_len"]) : dsk.data_len;
? parse_size(options["new_data_len"])
: (options.find("new_data_offset") != options.end()
? dsk.data_device_size-new_data_offset
: dsk.data_len);
new_meta_offset = options.find("new_meta_offset") != options.end()
? parse_size(options["new_meta_offset"]) : dsk.meta_offset;
new_meta_len = options.find("new_meta_len") != options.end()
@@ -123,6 +126,14 @@ int disk_tool_t::resize_parse_params()
? parse_size(options["new_journal_offset"]) : dsk.journal_offset;
new_journal_len = options.find("new_journal_len") != options.end()
? parse_size(options["new_journal_len"]) : dsk.journal_len;
if (new_data_len+new_data_offset > dsk.data_device_size)
new_data_len = dsk.data_device_size-new_data_offset;
if (new_meta_device == dsk.data_device && new_data_offset < new_meta_offset &&
new_data_len+new_data_offset > new_meta_offset)
new_data_len = new_meta_offset-new_data_offset;
if (new_journal_device == dsk.data_device && new_data_offset < new_journal_offset &&
new_data_len+new_data_offset > new_journal_offset)
new_data_len = new_journal_offset-new_data_offset;
if (new_meta_device == dsk.meta_device &&
new_journal_device == dsk.journal_device &&
new_data_offset == dsk.data_offset &&
@@ -159,10 +170,10 @@ void disk_tool_t::resize_init(blockstore_meta_header_v2_t *hdr)
dsk.data_csum_type = hdr->data_csum_type;
dsk.csum_block_size = hdr->csum_block_size;
}
if (((new_data_len-dsk.data_len) % dsk.data_block_size) ||
((new_data_offset-dsk.data_offset) % dsk.data_block_size))
if (((new_data_offset-dsk.data_offset) % dsk.data_block_size))
{
fprintf(stderr, "Data alignment mismatch\n");
fprintf(stderr, "Data alignment mismatch: old data offset is 0x%jx, new is 0x%jx, but alignment on %x should be equal\n",
dsk.data_offset, new_data_offset, dsk.data_block_size);
exit(1);
}
data_idx_diff = ((int64_t)(dsk.data_offset-new_data_offset)) / dsk.data_block_size;
@@ -220,10 +231,10 @@ int disk_tool_t::resize_remap_blocks()
}
for (uint64_t i = 0; i < free_last; i++)
{
if (data_alloc->get(total_blocks-i))
data_remap[total_blocks-i] = 0;
if (data_alloc->get(total_blocks-i-1))
data_remap[total_blocks-i-1] = 0;
else
data_alloc->set(total_blocks-i, true);
data_alloc->set(total_blocks-i-1, true);
}
for (auto & p: data_remap)
{
@@ -246,7 +257,7 @@ int disk_tool_t::resize_copy_data()
iodepth = 32;
}
ringloop = new ring_loop_t(iodepth < RINGLOOP_DEFAULT_SIZE ? RINGLOOP_DEFAULT_SIZE : iodepth);
dsk.data_fd = open(dsk.data_device.c_str(), O_DIRECT|O_RDWR);
dsk.data_fd = open(dsk.data_device.c_str(), (options["io"] == "cached" ? 0 : O_DIRECT) | O_RDWR);
if (dsk.data_fd < 0)
{
fprintf(stderr, "Failed to open data device %s: %s\n", dsk.data_device.c_str(), strerror(errno));
@@ -441,7 +452,7 @@ int disk_tool_t::resize_rewrite_journal()
int disk_tool_t::resize_write_new_journal()
{
new_journal_fd = open(new_journal_device.c_str(), O_DIRECT|O_RDWR);
new_journal_fd = open(new_journal_device.c_str(), (options["io"] == "cached" ? 0 : O_DIRECT) | O_RDWR);
if (new_journal_fd < 0)
{
fprintf(stderr, "Failed to open new journal device %s: %s\n", new_journal_device.c_str(), strerror(errno));
@@ -467,12 +478,13 @@ int disk_tool_t::resize_rewrite_meta()
blockstore_meta_header_v2_t *new_hdr = (blockstore_meta_header_v2_t *)new_meta_buf;
new_hdr->zero = 0;
new_hdr->magic = BLOCKSTORE_META_MAGIC_V1;
new_hdr->version = BLOCKSTORE_META_FORMAT_V1;
new_hdr->version = BLOCKSTORE_META_FORMAT_V2;
new_hdr->meta_block_size = dsk.meta_block_size;
new_hdr->data_block_size = dsk.data_block_size;
new_hdr->bitmap_granularity = dsk.bitmap_granularity ? dsk.bitmap_granularity : 4096;
new_hdr->data_csum_type = dsk.data_csum_type;
new_hdr->csum_block_size = dsk.csum_block_size;
new_hdr->header_csum = crc32c(0, new_hdr, sizeof(*new_hdr));
},
[this](uint64_t block_num, clean_disk_entry *entry, uint8_t *bitmap)
{
@@ -481,7 +493,7 @@ int disk_tool_t::resize_rewrite_meta()
block_num = remap_it->second;
if (block_num < free_first || block_num >= total_blocks-free_last)
{
fprintf(stderr, "BUG: remapped block not in range\n");
fprintf(stderr, "BUG: remapped block %ju not in range %ju..%ju\n", block_num, free_first, total_blocks-free_last);
exit(1);
}
block_num += data_idx_diff;
@@ -494,6 +506,8 @@ int disk_tool_t::resize_rewrite_meta()
memcpy(new_entry->bitmap, bitmap, 2*new_clean_entry_bitmap_size + new_data_csum_size);
else
memset(new_entry->bitmap, 0xff, 2*new_clean_entry_bitmap_size);
uint32_t *new_entry_csum = (uint32_t*)(((uint8_t*)new_entry) + new_clean_entry_size - 4);
*new_entry_csum = crc32c(0, new_entry, new_clean_entry_size - 4);
}
);
if (r != 0)
@@ -507,7 +521,7 @@ int disk_tool_t::resize_rewrite_meta()
int disk_tool_t::resize_write_new_meta()
{
new_meta_fd = open(new_meta_device.c_str(), O_DIRECT|O_RDWR);
new_meta_fd = open(new_meta_device.c_str(), (options["io"] == "cached" ? 0 : O_DIRECT) | O_RDWR);
if (new_meta_fd < 0)
{
fprintf(stderr, "Failed to open new metadata device %s: %s\n", new_meta_device.c_str(), strerror(errno));

View File

@@ -37,6 +37,8 @@ int disk_tool_t::resize_data(std::string device)
fprintf(stderr, "%s\n", e.what());
return 1;
}
// Save FD numbers because calc_lengths() relies on them
int old_journal_fd = dsk.journal_fd, old_meta_fd = dsk.meta_fd, old_data_fd = dsk.data_fd;
dsk.close_all();
bool dry_run = options.find("dry_run") != options.end();
auto old_journal_device = dsk.journal_device;
@@ -48,6 +50,22 @@ int disk_tool_t::resize_data(std::string device)
if (options.find("move_journal") == options.end())
options["move_journal"] = dsk.journal_device == dsk.data_device ? "" : dsk.journal_device;
}
uint64_t new_data_dev_size = 0;
if (options.find("data_size") != options.end())
{
new_data_dev_size = parse_size(options["data_size"]);
new_data_dev_size = options["data_size"] == "max" || new_data_dev_size > dsk.data_device_size
? dsk.data_device_size : new_data_dev_size;
dsk.data_device_size = new_data_dev_size;
dsk.cfg_data_size = 0;
dsk.journal_fd = old_journal_fd;
dsk.meta_fd = old_meta_fd;
dsk.data_fd = old_data_fd;
dsk.calc_lengths(true);
dsk.journal_fd = -1;
dsk.meta_fd = -1;
dsk.data_fd = -1;
}
std::map<std::string, std::string> move_options;
if (options.find("move_journal") != options.end())
{
@@ -69,14 +87,8 @@ int disk_tool_t::resize_data(std::string device)
new_data_offset += ((dsk.data_offset-new_data_offset) % dsk.data_block_size);
if (new_data_offset != dsk.data_offset)
move_options["new_data_offset"] = std::to_string(new_data_offset);
if (options.find("data_size") != options.end())
{
auto new_data_dev_size = parse_size(options["data_size"]);
new_data_dev_size = options["data_size"] == "max" || new_data_dev_size > dsk.data_device_size
? dsk.data_device_size : new_data_dev_size;
if (new_data_dev_size-dsk.data_offset != dsk.data_len)
move_options["new_data_len"] = std::to_string(new_data_dev_size-new_data_offset);
}
if (new_data_dev_size != 0)
move_options["new_data_len"] = std::to_string(new_data_dev_size-new_data_offset);
new_meta_offset = 4096 + (new_meta_device == new_journal_device ? new_journal_len : 0);
if (new_meta_offset != dsk.meta_offset)
move_options["new_meta_offset"] = std::to_string(new_meta_offset);
@@ -188,17 +200,12 @@ int disk_tool_t::resize_parse_move_journal(std::map<std::string, std::string> &
else
options["move_journal"] = "<new journal partition on "+parent_dev+">";
}
else if (options["move_journal"].substr(0, 22) != "/dev/disk/by-partuuid/")
{
// Partitions should be identified by GPT partition UUID
fprintf(stderr, "%s does not start with /dev/disk/by-partuuid/. Partitions should be identified by GPT partition UUIDs\n", options["move_journal"].c_str());
return 1;
}
else
{
// already a partition - check that it's a GPT partition with correct type
if (options.find("force") == options.end() &&
check_existing_partition(real_dev) != 0)
if ((options.find("force") == options.end()
? check_existing_partition(options["move_journal"])
: fix_partition_type(options["move_journal"])) != 0)
{
return 1;
}
@@ -269,17 +276,12 @@ int disk_tool_t::resize_parse_move_meta(std::map<std::string, std::string> & mov
else
options["move_meta"] = "<new metadata partition on "+parent_dev+">";
}
else if (options["move_meta"].substr(0, 22) != "/dev/disk/by-partuuid/")
{
// Partitions should be identified by GPT partition UUID
fprintf(stderr, "%s does not start with /dev/disk/by-partuuid/. Partitions should be identified by GPT partition UUIDs\n", options["move_meta"].c_str());
return 1;
}
else
{
// already a partition - check that it's a GPT partition with correct type
if (options.find("force") == options.end() &&
check_existing_partition(real_dev) != 0)
if ((options.find("force") == options.end()
? check_existing_partition(options["move_meta"])
: fix_partition_type(options["move_meta"])) != 0)
{
return 1;
}

View File

@@ -122,7 +122,7 @@ uint32_t disk_tool_t::write_osd_superblock(std::string device, json11::Json para
sb->size = sb_size;
memcpy(sb->json_data, json_data.c_str(), json_data.size());
sb->crc32c = crc32c(0, &sb->size, sb->size - ((uint8_t*)&sb->size - buf));
int fd = open(device.c_str(), O_DIRECT|O_RDWR);
int fd = open(device.c_str(), (options["io"] == "cached" ? 0 : O_DIRECT) | O_RDWR);
if (fd < 0)
{
fprintf(stderr, "Failed to open device %s: %s\n", device.c_str(), strerror(errno));
@@ -150,7 +150,7 @@ json11::Json disk_tool_t::read_osd_superblock(std::string device, bool expect_ex
json11::Json osd_params;
std::string json_err;
std::string real_device, device_type, real_data, real_meta, real_journal;
int r, fd = open(device.c_str(), O_DIRECT|O_RDWR);
int r, fd = open(device.c_str(), (options["io"] == "cached" ? 0 : O_DIRECT) | O_RDWR);
if (fd < 0)
{
fprintf(stderr, "Failed to open device %s: %s\n", device.c_str(), strerror(errno));
@@ -385,7 +385,7 @@ int disk_tool_t::pre_exec_osd(std::string device)
int disk_tool_t::clear_osd_superblock(const std::string & dev)
{
uint8_t *buf = (uint8_t*)memalign_or_die(MEM_ALIGNMENT, 4096);
int fd = -1, r = open(dev.c_str(), O_DIRECT|O_RDWR);
int fd = -1, r = open(dev.c_str(), (options["io"] == "cached" ? 0 : O_DIRECT) | O_RDWR);
if (r >= 0)
{
fd = r;

View File

@@ -343,23 +343,42 @@ uint64_t free_from_parttable(json11::Json pt)
return free;
}
int fix_partition_type(std::string dev_by_uuid)
int fix_partition_type_uuid(std::string & dev_by_uuid, const std::string & type_uuid)
{
auto uuid = strtolower(dev_by_uuid.substr(dev_by_uuid.rfind('/')+1));
std::string parent_dev = get_parent_device(realpath_str(dev_by_uuid, false));
bool is_partuuid = dev_by_uuid.substr(0, 22) == "/dev/disk/by-partuuid/";
auto uuid = is_partuuid ? strtolower(dev_by_uuid.substr(22)) : "";
auto node = realpath_str(dev_by_uuid, false);
std::string parent_dev = get_parent_device(node);
if (parent_dev == "")
return 1;
auto pt = read_parttable(parent_dev);
if (pt.is_null() || pt.is_bool())
return 1;
bool found = false;
std::string script = "label: gpt\n\n";
for (const auto & part: pt["partitions"].array_items())
{
bool this_part = (strtolower(part["uuid"].string_value()) == uuid);
if (this_part && strtolower(part["type"].string_value()) == "e7009fac-a5a1-4d72-af72-53de13059903")
bool this_part = (part["node"].string_value() == node) &&
(!is_partuuid || strtolower(part["uuid"].string_value()) == uuid);
if (this_part)
{
// Already correct type
return 0;
found = true;
if (!is_partuuid)
{
if (part["uuid"] == "")
{
fprintf(stderr, "Could not determine partition UUID for %s. Please use GPT partitions\n", dev_by_uuid.c_str());
return 1;
}
auto new_dev = "/dev/disk/by-partuuid/"+strtolower(part["uuid"].string_value());
fprintf(stderr, "Using %s instead of %s\n", new_dev.c_str(), dev_by_uuid.c_str());
dev_by_uuid = new_dev;
}
if (strtolower(part["type"].string_value()) == type_uuid)
{
// Already correct type
return 0;
}
}
script += part["node"].string_value()+": ";
bool first = true;
@@ -369,13 +388,18 @@ int fix_partition_type(std::string dev_by_uuid)
{
script += (first ? "" : ", ")+kv.first+"="+
(kv.first == "type" && this_part
? "e7009fac-a5a1-4d72-af72-53de13059903"
? type_uuid
: (kv.second.is_string() ? kv.second.string_value() : kv.second.dump()));
first = false;
}
}
script += "\n";
}
if (!found)
{
fprintf(stderr, "Could not find partition table entry for %s\n", dev_by_uuid.c_str());
return 1;
}
std::string out;
return shell_exec({ "sfdisk", "--no-reread", "--no-tell-kernel", "--force", parent_dev }, script, &out, NULL);
}

View File

@@ -68,6 +68,9 @@ TEST_NAME=csum_4k_dmj OSD_ARGS="--data_csum_type crc32c --inmemory_metadata fal
TEST_NAME=csum_4k_dj OSD_ARGS="--data_csum_type crc32c --inmemory_journal false" OFFSET_ARGS=$OSD_ARGS ./test_heal.sh
TEST_NAME=csum_4k OSD_ARGS="--data_csum_type crc32c" OFFSET_ARGS=$OSD_ARGS ./test_heal.sh
./test_resize.sh
./test_resize_auto.sh
./test_snapshot_pool2.sh
./test_osd_tags.sh

View File

@@ -3,6 +3,7 @@
PG_COUNT=${PG_COUNT:-32}
. `dirname $0`/run_3osds.sh
check_qemu
LD_PRELOAD="build/src/client/libfio_vitastor.so" \
fio -thread -name=test -ioengine=build/src/client/libfio_vitastor.so -bs=4M -direct=1 -iodepth=4 \
@@ -26,22 +27,22 @@ for i in $(seq 1 $OSD_COUNT); do
offsets=$(build/src/disk_tool/vitastor-disk simple-offsets --format json ./testdata/bin/test_osd$i.bin)
meta_offset=$(echo $offsets | jq -r .meta_offset)
data_offset=$(echo $offsets | jq -r .data_offset)
build/src/disk_tool/vitastor-disk dump-journal --json ./testdata/bin/test_osd$i.bin 4096 0 $meta_offset >./testdata/journal_before_resize.json
build/src/disk_tool/vitastor-disk dump-meta ./testdata/bin/test_osd$i.bin 4096 $meta_offset $((data_offset-meta_offset)) >./testdata/meta_before_resize.json
build/src/disk_tool/vitastor-disk resize \
build/src/disk_tool/vitastor-disk dump-journal --io cached --json ./testdata/bin/test_osd$i.bin 4096 0 $meta_offset >./testdata/journal_before_resize.json
build/src/disk_tool/vitastor-disk dump-meta --io cached ./testdata/bin/test_osd$i.bin 4096 $meta_offset $((data_offset-meta_offset)) >./testdata/meta_before_resize.json
build/src/disk_tool/vitastor-disk raw-resize --io cached \
$(build/src/disk_tool/vitastor-disk simple-offsets --format options ./testdata/bin/test_osd$i.bin 2>/dev/null) \
--new_meta_offset 0 \
--new_meta_len $((1024*1024)) \
--new_journal_offset $((1024*1024)) \
--new_data_offset $((128*1024*1024))
build/src/disk_tool/vitastor-disk dump-journal --json ./testdata/bin/test_osd$i.bin 4096 $((1024*1024)) $((127*1024*1024)) >./testdata/journal_after_resize.json
build/src/disk_tool/vitastor-disk dump-meta ./testdata/bin/test_osd$i.bin 4096 0 $((1024*1024)) >./testdata/meta_after_resize.json
--new_data_offset $((128*1024*1024+32768))
build/src/disk_tool/vitastor-disk dump-journal --io cached --json ./testdata/bin/test_osd$i.bin 4096 $((1024*1024)) $((127*1024*1024)) >./testdata/journal_after_resize.json
build/src/disk_tool/vitastor-disk dump-meta --io cached ./testdata/bin/test_osd$i.bin 4096 0 $((1024*1024)) >./testdata/meta_after_resize.json
if ! (cat ./testdata/meta_before_resize.json ./testdata/meta_after_resize.json | \
jq -e -s 'map([ .entries[] | del(.block) ] | sort_by(.pool, .inode, .stripe)) | .[0] == .[1] and (.[0] | length) > 1000'); then
format_error "OSD $i metadata corrupted after resizing"
fi
if ! (cat ./testdata/journal_before_resize.json ./testdata/journal_after_resize.json | \
jq -e -s 'map([ .[].entries[] | del(.crc32, .crc32_prev, .valid, .loc, .start) ]) | .[0] == .[1] and (.[0] | length) > 1'); then
jq -e -s 'map([ .[] | del(.crc32, .crc32_prev, .valid, .loc, .start) ]) | .[0] == .[1] and (.[0] | length) > 1'); then
format_error "OSD $i journal corrupted after resizing"
fi
done
@@ -53,7 +54,7 @@ for i in $(seq 1 $OSD_COUNT); do
--data_device ./testdata/bin/test_osd$i.bin \
--meta_offset 0 \
--journal_offset $((1024*1024)) \
--data_offset $((128*1024*1024)) >>./testdata/osd$i.log 2>&1 &
--data_offset $((128*1024*1024+32768)) >>./testdata/osd$i.log 2>&1 &
eval OSD${i}_PID=$!
done

94
tests/test_resize_auto.sh Executable file
View File

@@ -0,0 +1,94 @@
#!/bin/bash -ex
ANTIETCD=1
. `dirname $0`/common.sh
[[ -e build/src/disk_tool/vitastor-disk-test ]] || ln -s vitastor-disk build/src/disk_tool/vitastor-disk-test
dd if=/dev/zero of=./testdata/bin/test_osd1.bin bs=1 count=1 seek=$((100*1024*1024*1024-1))
LOOP1=$(sudo losetup --show -f ./testdata/bin/test_osd1.bin)
trap "kill -9 $(jobs -p) || true; sudo losetup -d $LOOP1"' || true' EXIT
dd if=/dev/zero of=./testdata/bin/test_meta.bin bs=1 count=1 seek=$((1024*1024*1024-1))
LOOP2=$(sudo losetup --show -f ./testdata/bin/test_meta.bin)
trap "kill -9 $(jobs -p) || true; sudo losetup -d $LOOP1 $LOOP2"' || true' EXIT
# also test prepare --hybrid :)
# non-vitastor random type UUID to prevent udev activation
mount | grep '/dev type devtmpfs' || sudo mount udev /dev/ -t devtmpfs
sudo build/src/disk_tool/vitastor-disk-test prepare --no_init 1 --meta_reserve 1x,1M \
--block_size 131072 --osd_num 987654 --part_type_uuid 0df42ae0-3695-4395-a957-7d5ff3645c56 \
--hybrid --fast-devices $LOOP2 $LOOP1
# write almost empty journal
node <<EOF > ./testdata/journal.json
console.log(JSON.stringify([
{"type":"start","start":"0x1000"},
{"type":"big_write_instant","inode":"0x1000000000001","stripe":"0xc60000","ver":"10","offset":0,"len":131072,"loc":"0x18ffdc0000","bitmap":"ffffffff"}
]));
EOF
sudo build/src/disk_tool/vitastor-disk write-journal ${LOOP1}p1 < ./testdata/journal.json
sudo build/src/disk_tool/vitastor-disk dump-journal --json --format data ${LOOP1}p1 | jq -S '[ .[] | del(.crc32, .crc32_prev) ]' > ./testdata/j2.json
jq -S '[ .[] + {"valid":true} ]' < ./testdata/journal.json > ./testdata/j1.json
diff ./testdata/j1.json ./testdata/j2.json
# write fake metadata items in the end
DATA_DEV_SIZE=$(sudo blockdev --getsize64 ${LOOP1}p1)
BLOCK_COUNT=$(((DATA_DEV_SIZE-4096)/128/1024))
node <<EOF > ./testdata/meta.json
console.log(JSON.stringify({
version: "0.9",
meta_block_size: 4096,
data_block_size: 131072,
bitmap_granularity: 4096,
data_csum_type: "none",
csum_block_size: 0,
entries: [ ...new Array(100).keys() ].map(i => ({
block: ($BLOCK_COUNT-100)+i,
pool: 1,
inode: "0x1",
stripe: "0x"+Number(i*0x20000).toString(16),
version: 10,
bitmap: "ffffffff",
ext_bitmap: "ffffffff",
})),
}));
EOF
# also test write & dump
sudo build/src/disk_tool/vitastor-disk write-meta ${LOOP1}p1 < ./testdata/meta.json
sudo build/src/disk_tool/vitastor-disk dump-meta ${LOOP1}p1 > ./testdata/compare.json
jq -S < ./testdata/meta.json > ./testdata/1.json
jq -S < ./testdata/compare.json > ./testdata/2.json
diff ./testdata/1.json ./testdata/2.json
# move journal & meta back, data will become smaller; end indexes should be shifted by -1251
sudo build/src/disk_tool/vitastor-disk-test resize --move-journal '' --move-meta '' ${LOOP1}p1
sudo build/src/disk_tool/vitastor-disk dump-meta ${LOOP1}p1 | jq -S > ./testdata/2.json
jq -S '. + {"entries": [ .entries[] | (. + { "block": (.block-1251) }) ]}' < ./testdata/meta.json > ./testdata/1.json
diff ./testdata/1.json ./testdata/2.json
sudo build/src/disk_tool/vitastor-disk dump-journal --json --format data ${LOOP1}p1 | jq -S '[ .[] | del(.crc32, .crc32_prev) ]' > ./testdata/j2.json
jq -S '[ (.[] + {"valid":true}) | (if .type == "big_write_instant" then . + {"loc":"0x18f6160000"} else . end) ]' < ./testdata/journal.json > ./testdata/j1.json
diff ./testdata/j1.json ./testdata/j2.json
# move journal & meta out, data will become larger; end indexes should be shifted back by +1251
sudo build/src/disk_tool/vitastor-disk-test resize --move-journal ${LOOP2}p1 --move-meta ${LOOP2}p2 ${LOOP1}p1
sudo build/src/disk_tool/vitastor-disk dump-meta ${LOOP1}p1 | jq -S > ./testdata/2.json
jq -S < ./testdata/meta.json > ./testdata/1.json
diff ./testdata/1.json ./testdata/2.json
jq -S '[ .[] + {"valid":true} ]' < ./testdata/journal.json > ./testdata/j1.json
sudo build/src/disk_tool/vitastor-disk dump-journal --json --format data ${LOOP1}p1 | jq -S '[ .[] | del(.crc32, .crc32_prev) ]' > ./testdata/j2.json
# reduce data device size by exactly 128k * 99 (occupied blocks); exactly 1 should be left in place :)
sudo build/src/disk_tool/vitastor-disk-test resize --data-size $((DATA_DEV_SIZE-128*1024*99)) ${LOOP1}p1
sudo build/src/disk_tool/vitastor-disk dump-meta ${LOOP1}p1 | jq -S > ./testdata/2.json
jq -S '. + {"entries": ([ .entries[] | (. + { "block": (.block | if . > '$BLOCK_COUNT'-100 then .-('$BLOCK_COUNT'-100+1) else '$BLOCK_COUNT'-100 end) }) ] | .[1:] + [ .[0] ])}' < ./testdata/meta.json > ./testdata/1.json
diff ./testdata/1.json ./testdata/2.json
jq -S '[ .[] + {"valid":true} ]' < ./testdata/journal.json > ./testdata/j1.json
sudo build/src/disk_tool/vitastor-disk dump-journal --json --format data ${LOOP1}p1 | jq -S '[ .[] | del(.crc32, .crc32_prev) ]' > ./testdata/j2.json
# extend data device size to maximum
sudo build/src/disk_tool/vitastor-disk-test resize --data-size max ${LOOP1}p1
sudo build/src/disk_tool/vitastor-disk dump-meta ${LOOP1}p1 | jq -S > ./testdata/2.json
diff ./testdata/1.json ./testdata/2.json
format_green OK