- Do not try to allocate more DB blocks in an inode block until it's "confirmed" and "locked" by the first write
- Do not recheck for new zero DB blocks on first write into an inode block - a CAS failure means someone else is already writing into it
- Throw new allocation blocks away regardless of whether the known_version is 0 on a CAS failure
- do not overwrite a block with older version if known version is newer
(read may start before update and end after update)
- invalidated block versions can't be remembered and trusted
- right boundary for split blocks is right_half when diving down, not key_lt
- restart update also when block is "invalidated", not just on version mismatch
- copy callback in listings to avoid closure destruction bugs too
- Prevent _split types of new blocks
- Stop updating new blocks only after the whole update, otherwise pointers
may become invalid
- Use recheck_none for updates initially
- Use UINT64_MAX as initial block version when postponing ops, otherwise the
check fails when the block is initially empty. This for example leads to
writing both leaf items & block pointers (which is incorrect) into the root
block when starting stress-test with --parallelism 32
- Fix -EINTR comparison
- track block versions correctly - per inode block (128kb) instead of tree block (4kb)
- prevent multiple parallel CAS writes of the same inode block
- add logging for EILSEQ which means invalid data in the tree
- fix get_block updated flag which was true for blocks already in cache and was leading to infinite loops on "unrelated block" errors
- apply changes to blocks in cache only after successful writes (using "virtual changes")
- do not replace cached block with an older version from disk
- recheck "unrelated blocks" (read/update collisions) until data stops changing
- track tree path correctly - do not treat split block as parent of its right half
- correctly move blocks when finding new empty place on disk
- restart updates from the beginning when one of blocks is changed by a parallel update
- fix delete using SET opcode and setting key to the empty value instead
- prevent changing the same key more than 1 time in parallel
- fix listing verification
- resume continue_updates in update_find (required because it uses continue_update itself)
- add allow_old_cached parameter to get()
Test / test_write_no_same (push) Successful in 23sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 5m2sDetails
Test / test_write_xor (push) Successful in 55sDetails
Test / test_rebalance_verify_ec (push) Successful in 6m22sDetails
Test / test_heal_pg_size_2 (push) Successful in 5m41sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 5m59sDetails
Test / test_heal_csum_32k_dj (push) Successful in 7m19sDetails
Test / test_heal_csum_32k (push) Successful in 7m17sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 7m14sDetails
Test / test_scrub (push) Successful in 1m12sDetails
Test / test_heal_ec (push) Successful in 9m2sDetails
Test / test_scrub_xor (push) Successful in 56sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 1m8sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 2m1sDetails
Test / test_heal_csum_4k_dj (push) Successful in 4m45sDetails
Test / test_scrub_pg_size_3 (push) Successful in 2m31sDetails
Test / test_heal_csum_4k (push) Successful in 4m54sDetails
Test / test_scrub_ec (push) Successful in 46sDetails
- Do not use \r if output is not a terminal (should fix unexpected job output in proxmox)
- Fix rm/rm-data error return code, add --down-ok option to bypass the error
- Add EIO retry timeout and allow to disable these retries, rename up_wait_retry_interval to client_retry_interval
- Add ubuntu jammy build
- Wait for blockstore initialisation before starting OSD (prevent timeouts when init takes time)
- Fix a rare use-after-free in automatic sync after delete in blockstore
Test / test_write_no_same (push) Successful in 21sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 3m37sDetails
Test / test_write_xor (push) Successful in 1m11sDetails
Test / test_rebalance_verify_ec (push) Successful in 7m14sDetails
Test / test_heal_pg_size_2 (push) Successful in 4m3sDetails
Test / test_heal_ec (push) Successful in 4m18sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 5m5sDetails
Test / test_heal_csum_32k_dj (push) Successful in 6m52sDetails
Test / test_heal_csum_32k (push) Successful in 6m23sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 6m23sDetails
Test / test_scrub (push) Successful in 1m30sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 1m18sDetails
Test / test_heal_csum_4k_dj (push) Successful in 7m9sDetails
Test / test_scrub_xor (push) Successful in 57sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m5sDetails
Test / test_scrub_ec (push) Successful in 1m6sDetails
Test / test_scrub_pg_size_3 (push) Successful in 2m3sDetails
Test / test_heal_csum_4k (push) Successful in 4m54sDetails
ASan report: [0] READ of size 16 at operator() /root/vitastor/src/blockstore_write.cpp:100
...[5] blockstore_impl_t::ack_sync(blockstore_op_t*) /root/vitastor/src/blockstore_sync.cpp:232
- Fix another old "BUG: Attempt to overwrite used offset" in a very simple
case: bs=4k rw=write iodepth=16 from OSD start; add this case to tests
- Fix a rare crash with "unexpected state during flush: 0x51" possible with
EC since 1.4.2 during rebalance and OSD outages
- Fix a rare write stall with EC & immediate_commit=none caused by sync
operations reserving unneeded space in the journal
- Fix 32-bit build warnings, most in printf/scanf format strings
Test / test_write_no_same (push) Successful in 17sDetails
Test / test_write_xor (push) Successful in 38sDetails
Test / test_rebalance_verify_ec (push) Successful in 4m38sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 3m57sDetails
Test / test_heal_csum_32k_dj (push) Successful in 5m14sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 5m21sDetails
Test / test_heal_csum_32k (push) Successful in 5m45sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 5m27sDetails
Test / test_scrub (push) Successful in 1m30sDetails
Test / test_heal_csum_4k_dj (push) Successful in 5m26sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 38sDetails
Test / test_scrub_xor (push) Successful in 40sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m8sDetails
Test / test_scrub_ec (push) Successful in 1m5sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m49sDetails
Test / test_heal_csum_4k (push) Successful in 5m41sDetails
Test / test_heal_ec (push) Successful in 4m11sDetails
Test / test_heal_pg_size_2 (push) Successful in 4m22sDetails
Unwavering stabilization of 1.4.x, continued :-)
- Include the accidentally lost part of 1.4.5 journal trimming fix
- Fix a possible OSD crash with "BUG: Attempt to overwrite used offset"
which was probably present for long time, but became apparent after
fixing flapping tests in CI
- Fix remaining flapping tests in CI. It was the first time when tests
actually passed without retries :-)
Test / test_scrub_zero_osd_2 (push) Successful in 29sDetails
Test / test_scrub_xor (push) Successful in 30sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 43sDetails
Test / test_scrub_ec (push) Successful in 32sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m46sDetails
Test / test_heal_csum_4k (push) Successful in 4m4sDetails
Test / test_write (push) Successful in 1m38sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 4m5sDetails
Test / test_heal_csum_32k (push) Successful in 4m15sDetails
- Fix a write stall caused by incorrect journal trimming introduced in 1.4.4 :)
- Fix PGs sometimes hanging in "starting" state on mass OSD restarts
- Fix a rare crash with "map::at" during OSD pings
- Use new defaults for non-capacitor (desktop) SSDs - improves T1Q256 random write from ~6k iops to ~45k iops
- Make journal_trim_interval configurable
Test / test_write_no_same (push) Successful in 16sDetails
Test / test_write_xor (push) Successful in 39sDetails
Test / test_rebalance_verify_ec (push) Successful in 4m56sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 4m21sDetails
Test / test_heal_pg_size_2 (push) Successful in 4m15sDetails
Test / test_heal_ec (push) Successful in 5m1sDetails
Test / test_heal_csum_32k_dj (push) Successful in 5m32sDetails
Test / test_heal_csum_32k (push) Successful in 5m38sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 5m43sDetails
Test / test_scrub (push) Successful in 1m31sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 1m17sDetails
Test / test_heal_csum_4k_dj (push) Successful in 5m57sDetails
Test / test_scrub_xor (push) Successful in 30sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m7sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 41sDetails
Test / test_scrub_ec (push) Successful in 24sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 3m56sDetails
Test / test_heal_csum_4k (push) Successful in 3m16sDetails
A couple of fixes for EC pools
- Fix a segfault possible on partial EC overwrite in 1234 -> 5030 rebalance scenario
- Fix two problems leading to EC pools stalling on rebalance & parallel sudden stops
of OSDs, for example during a sudden poweroff of a host:
- Recovery auto-tuning (1.4.0 feature) could apply too large delays and stall
the EC journal - fixed by limiting delays with a new recovery_tune_sleep_cutoff_us
parameter (10 seconds by default) and applying recovery pauses before write
operations, not after them, to not occupy space in the journal for long time
- Dynamic journal space reservation (1.3.0 feature) wasn't accounting new writes
when checking the limit so OSDs could still fill the journal fully and stall -
fixed by including new writes into the limit
- Print etcd dbSize instead of dbSizeInUse in status
Test / test_write_xor (push) Successful in 39sDetails
Test / test_write_no_same (push) Successful in 16sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 4m13sDetails
Test / test_rebalance_verify_ec (push) Successful in 5m31sDetails
Test / test_heal_ec (push) Successful in 4m54sDetails
Test / test_heal_csum_32k_dj (push) Successful in 5m25sDetails
Test / test_heal_csum_32k (push) Successful in 6m8sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 6m17sDetails
Test / test_scrub (push) Successful in 1m8sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 55sDetails
Test / test_scrub_xor (push) Successful in 45sDetails
Test / test_heal_csum_4k_dj (push) Successful in 6m22sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m11sDetails
Test / test_scrub_ec (push) Successful in 46sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m39sDetails
Test / test_heal_csum_4k (push) Successful in 6m8sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 4m15sDetails
Test / test_heal_pg_size_2 (push) Successful in 4m41sDetails
Hotfix for hotfix O:-)
- "Write stall fix" was incomplete and EC write stalls could
continue even on 1.4.2. Now they're finally fixed O:-)
- Make monitor ignore statistics of stopped OSDs. Previously if you stopped all
OSDs the last total I/O numbers would remain the same indefinitely
Test / test_heal_csum_4k_dj (push) Successful in 6m14sDetails
Test / test_scrub_xor (push) Successful in 1m1sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m50sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 57sDetails
Test / test_scrub_ec (push) Successful in 52sDetails
Test / test_heal_csum_4k (push) Successful in 5m47sDetails
Test / test_snapshot_chain_ec (push) Successful in 1m24sDetails
- Log to systemd by default
- Fix excessive autosyncs after every operation with disabled immediate_commit (introduced in 1.1.0)
- Fix a possible write stall with EC due to the lack of OSD wakeup after stabilizing previous writes
- Change sync operation semantics as a final fix to possible write stalls with EC and disabled immediate_commit
- Sync after deleting data in CLI rm / rm-data if immediate_commit is disabled
- Fix OSDs ignoring syncs & autosyncs for delete operations
- Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)
- Speed up monitor failover - change default etcd_mon_ttl from 30 to 5 seconds
- Speed up operation retries - change default up_wait_retry_interval to 50 ms
- Add patch for libvirt 9.10
Test / test_write_xor (push) Successful in 44sDetails
Test / test_rebalance_verify_ec_imm (push) Successful in 2m52sDetails
Test / test_write_no_same (push) Successful in 15sDetails
Test / test_rebalance_verify_ec (push) Successful in 4m19sDetails
Test / test_heal_ec (push) Successful in 6m20sDetails
Test / test_heal_csum_32k (push) Successful in 3m29sDetails
Test / test_scrub (push) Successful in 1m24sDetails
Test / test_scrub_zero_osd_2 (push) Successful in 1m11sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 4m23sDetails
Test / test_scrub_xor (push) Successful in 1m9sDetails
Test / test_heal_csum_4k_dj (push) Successful in 5m29sDetails
Test / test_heal_csum_4k (push) Successful in 5m36sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m53sDetails
Test / test_scrub_ec (push) Successful in 29sDetails
Test / test_heal_pg_size_2 (push) Successful in 3m9sDetails
Test / test_heal_csum_32k_dmj (push) Successful in 4m13sDetails
Test / test_heal_csum_32k_dj (push) Successful in 4m17sDetails
Test / test_snapshot_chain_ec (push) Successful in 1m25sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Failing after 24sDetails
Should be a final remaining fix to EC + non-capacitor (non-immediate-commit) write hangs :).
First it was breaking non-EC ("instantly stable") writes because they sometimes
complete out of order which was leading to the following error:
terminate called after throwing an instance of 'std::runtime_error'
what(): BUG: Unexpected dirty_entry 1000000000001:29480000 v65540 unstable state during flush: 0x151
But it is easily fixed by scanning previous and next dirty_entries in mark_stable.
Test / test_scrub_zero_osd_2 (push) Successful in 38sDetails
Test / test_heal_csum_4k_dmj (push) Successful in 7m5sDetails
Test / test_scrub_xor (push) Successful in 58sDetails
Test / test_heal_csum_4k_dj (push) Successful in 6m25sDetails
Test / test_scrub_ec (push) Failing after 42sDetails
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m32sDetails
Test / test_scrub_pg_size_3 (push) Successful in 1m38sDetails
Test / test_heal_csum_4k (push) Successful in 5m38sDetails
- Fix a monitor crash on primary OSD switching introduced in 1.4.0
- Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree
- Fix a realloc memory leak in theory possible with too large listings (OSD_OP_LIST)