- Prevent _split types of new blocks
- Stop updating new blocks only after the whole update, otherwise pointers
may become invalid
- Use recheck_none for updates initially
- Use UINT64_MAX as initial block version when postponing ops, otherwise the
check fails when the block is initially empty. This for example leads to
writing both leaf items & block pointers (which is incorrect) into the root
block when starting stress-test with --parallelism 32
- Fix -EINTR comparison
- track block versions correctly - per inode block (128kb) instead of tree block (4kb)
- prevent multiple parallel CAS writes of the same inode block
- add logging for EILSEQ which means invalid data in the tree
- fix get_block updated flag which was true for blocks already in cache and was leading to infinite loops on "unrelated block" errors
- apply changes to blocks in cache only after successful writes (using "virtual changes")
- do not replace cached block with an older version from disk
- recheck "unrelated blocks" (read/update collisions) until data stops changing
- track tree path correctly - do not treat split block as parent of its right half
- correctly move blocks when finding new empty place on disk
- restart updates from the beginning when one of blocks is changed by a parallel update
- fix delete using SET opcode and setting key to the empty value instead
- prevent changing the same key more than 1 time in parallel
- fix listing verification
- resume continue_updates in update_find (required because it uses continue_update itself)
- add allow_old_cached parameter to get()
Test / test_write_no_same (push) Successful in 23sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 5m2sDetails
Test / test_write_xor (push) Successful in 55sDetails
Test / test_rebalance_verify_ec (push) Successful in 6m22sDetails
Test / test_heal_pg_size_2 (push) Successful in 5m41sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 5m59sDetails
Test / test_heal_csum_32k_dj (push) Successful in 7m19sDetails
Test / test_heal_csum_32k (push) Successful in 7m17sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 7m14sDetails
Test / test_scrub (push) Successful in 1m12sDetails
Test / test_heal_ec (push) Successful in 9m2sDetails
Test / test_scrub_xor (push) Successful in 56sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 1m8sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 2m1sDetails
Test / test_heal_csum_4k_dj (push) Successful in 4m45sDetails
Test / test_scrub_pg_size_3 (push) Successful in 2m31sDetails
Test / test_heal_csum_4k (push) Successful in 4m54sDetails
Test / test_scrub_ec (push) Successful in 46sDetails
- Do not use \r if output is not a terminal (should fix unexpected job output in proxmox)
- Fix rm/rm-data error return code, add --down-ok option to bypass the error
- Add EIO retry timeout and allow to disable these retries, rename up_wait_retry_interval to client_retry_interval
- Add ubuntu jammy build
- Wait for blockstore initialisation before starting OSD (prevent timeouts when init takes time)
- Fix a rare use-after-free in automatic sync after delete in blockstore
Test / test_write_no_same (push) Successful in 21sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 3m37sDetails
Test / test_write_xor (push) Successful in 1m11sDetails
Test / test_rebalance_verify_ec (push) Successful in 7m14sDetails
Test / test_heal_pg_size_2 (push) Successful in 4m3sDetails
Test / test_heal_ec (push) Successful in 4m18sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 5m5sDetails
Test / test_heal_csum_32k_dj (push) Successful in 6m52sDetails
Test / test_heal_csum_32k (push) Successful in 6m23sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 6m23sDetails
Test / test_scrub (push) Successful in 1m30sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 1m18sDetails
Test / test_heal_csum_4k_dj (push) Successful in 7m9sDetails
Test / test_scrub_xor (push) Successful in 57sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m5sDetails
Test / test_scrub_ec (push) Successful in 1m6sDetails
Test / test_scrub_pg_size_3 (push) Successful in 2m3sDetails
Test / test_heal_csum_4k (push) Successful in 4m54sDetails
ASan report: [0] READ of size 16 at operator() /root/vitastor/src/blockstore_write.cpp:100
...[5] blockstore_impl_t::ack_sync(blockstore_op_t*) /root/vitastor/src/blockstore_sync.cpp:232
- Fix another old "BUG: Attempt to overwrite used offset" in a very simple
case: bs=4k rw=write iodepth=16 from OSD start; add this case to tests
- Fix a rare crash with "unexpected state during flush: 0x51" possible with
EC since 1.4.2 during rebalance and OSD outages
- Fix a rare write stall with EC & immediate_commit=none caused by sync
operations reserving unneeded space in the journal
- Fix 32-bit build warnings, most in printf/scanf format strings
Test / test_write_no_same (push) Successful in 17sDetails
Test / test_write_xor (push) Successful in 38sDetails
Test / test_rebalance_verify_ec (push) Successful in 4m38sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 3m57sDetails
Test / test_heal_csum_32k_dj (push) Successful in 5m14sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 5m21sDetails
Test / test_heal_csum_32k (push) Successful in 5m45sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 5m27sDetails
Test / test_scrub (push) Successful in 1m30sDetails
Test / test_heal_csum_4k_dj (push) Successful in 5m26sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 38sDetails
Test / test_scrub_xor (push) Successful in 40sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m8sDetails
Test / test_scrub_ec (push) Successful in 1m5sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m49sDetails
Test / test_heal_csum_4k (push) Successful in 5m41sDetails
Test / test_heal_ec (push) Successful in 4m11sDetails
Test / test_heal_pg_size_2 (push) Successful in 4m22sDetails
Unwavering stabilization of 1.4.x, continued :-)
- Include the accidentally lost part of 1.4.5 journal trimming fix
- Fix a possible OSD crash with "BUG: Attempt to overwrite used offset"
which was probably present for long time, but became apparent after
fixing flapping tests in CI
- Fix remaining flapping tests in CI. It was the first time when tests
actually passed without retries :-)
Test / test_scrub_zero_osd_2 (push) Successful in 29sDetails
Test / test_scrub_xor (push) Successful in 30sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 43sDetails
Test / test_scrub_ec (push) Successful in 32sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m46sDetails
Test / test_heal_csum_4k (push) Successful in 4m4sDetails
Test / test_write (push) Successful in 1m38sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 4m5sDetails
Test / test_heal_csum_32k (push) Successful in 4m15sDetails
- Fix a write stall caused by incorrect journal trimming introduced in 1.4.4 :)
- Fix PGs sometimes hanging in "starting" state on mass OSD restarts
- Fix a rare crash with "map::at" during OSD pings
- Use new defaults for non-capacitor (desktop) SSDs - improves T1Q256 random write from ~6k iops to ~45k iops
- Make journal_trim_interval configurable
Test / test_write_no_same (push) Successful in 16sDetails
Test / test_write_xor (push) Successful in 39sDetails
Test / test_rebalance_verify_ec (push) Successful in 4m56sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 4m21sDetails
Test / test_heal_pg_size_2 (push) Successful in 4m15sDetails
Test / test_heal_ec (push) Successful in 5m1sDetails
Test / test_heal_csum_32k_dj (push) Successful in 5m32sDetails
Test / test_heal_csum_32k (push) Successful in 5m38sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 5m43sDetails
Test / test_scrub (push) Successful in 1m31sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 1m17sDetails
Test / test_heal_csum_4k_dj (push) Successful in 5m57sDetails
Test / test_scrub_xor (push) Successful in 30sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m7sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 41sDetails
Test / test_scrub_ec (push) Successful in 24sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 3m56sDetails
Test / test_heal_csum_4k (push) Successful in 3m16sDetails
A couple of fixes for EC pools
- Fix a segfault possible on partial EC overwrite in 1234 -> 5030 rebalance scenario
- Fix two problems leading to EC pools stalling on rebalance & parallel sudden stops
of OSDs, for example during a sudden poweroff of a host:
- Recovery auto-tuning (1.4.0 feature) could apply too large delays and stall
the EC journal - fixed by limiting delays with a new recovery_tune_sleep_cutoff_us
parameter (10 seconds by default) and applying recovery pauses before write
operations, not after them, to not occupy space in the journal for long time
- Dynamic journal space reservation (1.3.0 feature) wasn't accounting new writes
when checking the limit so OSDs could still fill the journal fully and stall -
fixed by including new writes into the limit
- Print etcd dbSize instead of dbSizeInUse in status
Test / test_write_xor (push) Successful in 39sDetails
Test / test_write_no_same (push) Successful in 16sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 4m13sDetails
Test / test_rebalance_verify_ec (push) Successful in 5m31sDetails
Test / test_heal_ec (push) Successful in 4m54sDetails
Test / test_heal_csum_32k_dj (push) Successful in 5m25sDetails
Test / test_heal_csum_32k (push) Successful in 6m8sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 6m17sDetails
Test / test_scrub (push) Successful in 1m8sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 55sDetails
Test / test_scrub_xor (push) Successful in 45sDetails
Test / test_heal_csum_4k_dj (push) Successful in 6m22sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m11sDetails
Test / test_scrub_ec (push) Successful in 46sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m39sDetails
Test / test_heal_csum_4k (push) Successful in 6m8sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 4m15sDetails
Test / test_heal_pg_size_2 (push) Successful in 4m41sDetails
Hotfix for hotfix O:-)
- "Write stall fix" was incomplete and EC write stalls could
continue even on 1.4.2. Now they're finally fixed O:-)
- Make monitor ignore statistics of stopped OSDs. Previously if you stopped all
OSDs the last total I/O numbers would remain the same indefinitely
Test / test_heal_csum_4k_dj (push) Successful in 6m14sDetails
Test / test_scrub_xor (push) Successful in 1m1sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m50sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 57sDetails
Test / test_scrub_ec (push) Successful in 52sDetails
Test / test_heal_csum_4k (push) Successful in 5m47sDetails
Test / test_snapshot_chain_ec (push) Successful in 1m24sDetails
- Log to systemd by default
- Fix excessive autosyncs after every operation with disabled immediate_commit (introduced in 1.1.0)
- Fix a possible write stall with EC due to the lack of OSD wakeup after stabilizing previous writes
- Change sync operation semantics as a final fix to possible write stalls with EC and disabled immediate_commit
- Sync after deleting data in CLI rm / rm-data if immediate_commit is disabled
- Fix OSDs ignoring syncs & autosyncs for delete operations
- Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)
- Speed up monitor failover - change default etcd_mon_ttl from 30 to 5 seconds
- Speed up operation retries - change default up_wait_retry_interval to 50 ms
- Add patch for libvirt 9.10
Test / test_write_xor (push) Successful in 44sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 2m52sDetails
Test / test_write_no_same (push) Successful in 15sDetails
Test / test_rebalance_verify_ec (push) Successful in 4m19sDetails
Test / test_heal_ec (push) Successful in 6m20sDetails
Test / test_heal_csum_32k (push) Successful in 3m29sDetails
Test / test_scrub (push) Successful in 1m24sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 1m11sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 4m23sDetails
Test / test_scrub_xor (push) Successful in 1m9sDetails
Test / test_heal_csum_4k_dj (push) Successful in 5m29sDetails
Test / test_heal_csum_4k (push) Successful in 5m36sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m53sDetails
Test / test_scrub_ec (push) Successful in 29sDetails
Test / test_heal_pg_size_2 (push) Successful in 3m9sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 4m13sDetails
Test / test_heal_csum_32k_dj (push) Successful in 4m17sDetails
Test / test_snapshot_chain_ec (push) Successful in 1m25sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Failing after 24sDetails
Should be a final remaining fix to EC + non-capacitor (non-immediate-commit) write hangs :).
First it was breaking non-EC ("instantly stable") writes because they sometimes
complete out of order which was leading to the following error:
terminate called after throwing an instance of 'std::runtime_error'
what(): BUG: Unexpected dirty_entry 1000000000001:29480000 v65540 unstable state during flush: 0x151
But it is easily fixed by scanning previous and next dirty_entries in mark_stable.
Test / test_scrub_zero_osd_2 (push) Successful in 38sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 7m5sDetails
Test / test_scrub_xor (push) Successful in 58sDetails
Test / test_heal_csum_4k_dj (push) Successful in 6m25sDetails
Test / test_scrub_ec (push) Failing after 42sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m32sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m38sDetails
Test / test_heal_csum_4k (push) Successful in 5m38sDetails
- Fix a monitor crash on primary OSD switching introduced in 1.4.0
- Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree
- Fix a realloc memory leak in theory possible with too large listings (OSD_OP_LIST)
Test / test_write_no_same (push) Successful in 14sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 3m5sDetails
Test / test_rebalance_verify_ec (push) Successful in 3m41sDetails
Test / test_heal_pg_size_2 (push) Successful in 3m45sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 4m52sDetails
Test / test_heal_ec (push) Successful in 5m11sDetails
Test / test_heal_csum_32k_dj (push) Successful in 5m42sDetails
Test / test_heal_csum_32k (push) Successful in 5m56sDetails
Test / test_scrub (push) Successful in 1m25sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 1m18sDetails
Test / test_scrub_xor (push) Successful in 42sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 6m49sDetails
Test / test_heal_csum_4k_dj (push) Successful in 6m32sDetails
Test / test_heal_csum_4k (push) Successful in 5m31sDetails
Test / test_scrub_ec (push) Successful in 50sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m2sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m5sDetails
Test / test_snapshot_chain_ec (push) Successful in 1m21sDetails
Test / test_write_xor (push) Successful in 36sDetails
New features:
- Intelligent recovery/rebalance speed auto-tuning to reduce its impact on clients (see README -> Features)
- Auto-restoration of dead VDUSE daemons in CSI plugin
- Add vitastor-disk update-sb command
- Update QEMU for Debian Bookworm to 8.1 and use it for CSI plugin
Bug fixes:
- Fix pools SOMETIMES staying inactive after stopping a node due to OSDs not reacting
to PG state changes caused by incorrect full reload of state from etcd on reconnection
- Make monitors retry pool configuration changes quickier which fixes them being unable
to apply changes when an ongoing rebalance is quickly making a lot of PGs clean
- Fix CSI plugin not accepting array of strings as etcd address in /etc/vitastor/vitastor.conf
- Allow multiple interfaces with the same IP address, for "simple routed" full mesh network
- Do not ignore loopback addresses for OSD network (to make ECMP setups with frr possible)
- Fix a rare client crash during OSD reconnections
- Only treat data partitions as existing OSDs in vitastor-disk prepare
- Remove etcd parameter from default command examples
- Fix reported free space sometimes changing non-immediately after deletion of data from OSDs
- Fix a possible OSD crash on print_slow when bs_op is NULL
- Use the same etcd_ws_keepalive_interval in mon as in OSD
- Fix mon not using values from config when /config/global is not present
- Remove pve-storage-portal-dns-list format for vitastor_etcd_address
- Parse log_level in cluster_client
- Fix vitastor-nbd image existence check not working because of non-zeroed inode_watch fields
- Do not warn on EPIPE in client unless log_level is raised explicitly
- Fix incorrect error in CSI when searching for the device in /sys
- Remove 2 last prints to stdout in etcd_state_client
- Fix a possible OSD crash when checking corrupted journal entries
Fixes a possible use-after-free in case of continue_ops() calling try_send(),
then connect_peer() -> set_timer() -> trigger_nearest() -> handle_op_part() -> continue_ops() again
Test / test_rebalance_verify (push) Successful in 4m12sDetails
Test / test_write_no_same (push) Successful in 15sDetails
Test / test_write_xor (push) Successful in 52sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 4m29sDetails
Test / test_rebalance_verify_ec (push) Successful in 5m25sDetails
Test / test_heal_pg_size_2 (push) Successful in 4m10sDetails
Test / test_heal_ec (push) Successful in 4m46sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 5m31sDetails
Test / test_heal_csum_32k_dj (push) Successful in 5m41sDetails
Test / test_heal_csum_32k (push) Successful in 6m41sDetails
Test / test_scrub (push) Successful in 1m13sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 6m53sDetails
Test / test_scrub_xor (push) Successful in 54sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 58sDetails
Test / test_heal_csum_4k_dj (push) Successful in 6m27sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m15sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m27sDetails
Test / test_heal_csum_4k (push) Successful in 6m20sDetails
Test / test_scrub_ec (push) Successful in 29sDetails
Test / test_move_reappear (push) Successful in 17sDetails
Also add protection from etcd watcher messages being split into multiple websocket
messages - I'm not sure if etcd actually does that, but it's better to have extra
protection anyway.
Also check that all etcd watchers are started in the keepalive routine, otherwise
it sometimes tries to revive etcd watchers starting with revision=1 which obviously
always fails because this revision is nearly always compacted.
All these changes should fix an old rarely reproduced bug where SOMETIMES OSDs
didn't react to PG config changes which was leading to offline pools on node reboot.
It happened on the full reload of state from etcd.
Test / test_snapshot_chain (push) Successful in 1m1sDetails
Test / test_snapshot_down (push) Successful in 19sDetails
Test / test_splitbrain (push) Successful in 12sDetails
Test / test_snapshot_down_ec (push) Failing after 3m10sDetails
Test / test_rebalance_verify (push) Successful in 2m45sDetails
Test / test_rebalance_verify_imm (push) Successful in 2m17sDetails
Test / test_write (push) Successful in 1m11sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 2m41sDetails
Test / test_write_no_same (push) Successful in 12sDetails
Test / test_write_xor (push) Failing after 3m6sDetails
Test / test_rebalance_verify_ec (push) Failing after 5m27sDetails
Test / test_heal_pg_size_2 (push) Failing after 3m7sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 4m36sDetails
Test / test_heal_csum_32k_dj (push) Failing after 4m53sDetails
Test / test_heal_csum_32k (push) Failing after 5m27sDetails
Test / test_heal_ec (push) Failing after 10m15sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 5m14sDetails
Test / test_scrub (push) Successful in 1m11sDetails
Test / test_heal_csum_4k_dj (push) Successful in 5m15sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 56sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m4sDetails
Test / test_heal_csum_4k (push) Failing after 5m31sDetails
Test / test_scrub_xor (push) Failing after 3m17sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Failing after 3m6sDetails
Test / test_change_pg_count_ec (push) Failing after 3m5sDetails
Test / test_snapshot_ec (push) Failing after 3m5sDetails
Test / test_scrub_ec (push) Failing after 3m5sDetails
Test / test_snapshot_chain_ec (push) Failing after 3m5sDetails
Test / test_interrupted_rebalance_ec (push) Failing after 10m5sDetails
New features:
- RDMA without ODP - much faster and all cards are now supported, not just Mellanox
- VDUSE in CSI - faster, more stable and can even recover after CSI pod restart!
- Reserve journal space for stabilize requests dynamically to prevent stalls under load with EC
- Raise default NBD timeout from 30 to 300 seconds and allow to take it from /etc/vitastor/vitastor.conf
- Remove explicit etcdUrl/etcdPrefix K8S storage class parameter support to prevent
etcd migration issues for volumes created with these parameters
- Support QEMU 8.1 and pve-qemu 8.1
Bug fixes:
- Fix RDMA connection (and thus memory) leak
- Fix rare crashes under load due to incorrect io_uring queue size tracking
- Fix monitor statistics aggregation in case of empty /osd/stats keys
- Fix crash on unknown long argument to vitastor-disk
- Allow trailing comma in JSONs again
- Fix crash on attempts to dump a long listing of objects "to stabilize" or "to rollback" in a slow op