1
0
Fork 0

master #3

Merged
antilles merged 33 commits from vitalif/vitastor:master into master 2024-02-13 14:44:09 +03:00

33 Commits (master)

Author SHA1 Message Date
Vitaliy Filippov c777a0041a Release 1.4.4
A couple of fixes for EC pools

- Fix a segfault possible on partial EC overwrite in 1234 -> 5030 rebalance scenario
- Fix two problems leading to EC pools stalling on rebalance & parallel sudden stops
  of OSDs, for example during a sudden poweroff of a host:
  - Recovery auto-tuning (1.4.0 feature) could apply too large delays and stall
    the EC journal - fixed by limiting delays with a new recovery_tune_sleep_cutoff_us
    parameter (10 seconds by default) and applying recovery pauses before write
    operations, not after them, to not occupy space in the journal for long time
  - Dynamic journal space reservation (1.3.0 feature) wasn't accounting new writes
    when checking the limit so OSDs could still fill the journal fully and stall -
    fixed by including new writes into the limit
- Print etcd dbSize instead of dbSizeInUse in status
2024-02-11 16:23:08 +03:00
Vitaliy Filippov 2947ea93e8 Raise test_snapshot_chain_ec timeout to 6 minutes 2024-02-11 16:13:52 +03:00
Vitaliy Filippov 978bdc128a Apply recovery pause before writes, after commits, and do not apply it to syncs to not block EC pools from functioning 2024-02-11 16:13:52 +03:00
Vitaliy Filippov bb2f395f1e Add cutoff threshold for recovery auto-tuning 2024-02-11 16:13:52 +03:00
Vitaliy Filippov b127da40f7 Add a FIXME about incomplete PGs 2024-02-11 13:42:51 +03:00
Vitaliy Filippov ca34a6047a Fix dynamic journal space reservation: include the new write itself, too 2024-02-11 13:42:51 +03:00
Vitaliy Filippov 38ba76e893 Fix flusher sometimes being unable to trim journal when the flush queue is empty 2024-02-11 13:42:51 +03:00
Vitaliy Filippov 1e3c4edea0 Print etcd dbSize instead of dbSizeInUse in status 2024-02-11 13:42:51 +03:00
Vitaliy Filippov e7ac855b07 Fix that EC segfault (1234 -> 5030 partial overwrite) 2024-02-11 13:42:51 +03:00
Vitaliy Filippov c53357ac45 Add a test for EC segfault with partial overwrite in 1234 -> 5030 rebalance scenario 2024-02-11 13:42:51 +03:00
Vitaliy Filippov 27e9f244ec Release 1.4.3
Hotfix for hotfix O:-)

- "Write stall fix" was incomplete and EC write stalls could
  continue even on 1.4.2. Now they're finally fixed O:-)
- Make monitor ignore statistics of stopped OSDs. Previously if you stopped all
  OSDs the last total I/O numbers would remain the same indefinitely
2024-02-09 00:29:31 +03:00
Vitaliy Filippov 8e25a28a08 Ignore down OSDs in monitor statistics aggregation 2024-02-09 00:22:36 +03:00
Vitaliy Filippov 5d3317e4f2 Followup to 1.4.2 write stall fix - sadly, the previous version was not working correctly :) 2024-02-08 19:34:29 +03:00
Vitaliy Filippov 016115c0d4 Release 1.4.2
- Log to systemd by default
- Fix excessive autosyncs after every operation with disabled immediate_commit (introduced in 1.1.0)
- Fix a possible write stall with EC due to the lack of OSD wakeup after stabilizing previous writes
- Change sync operation semantics as a final fix to possible write stalls with EC and disabled immediate_commit
- Sync after deleting data in CLI rm / rm-data if immediate_commit is disabled
- Fix OSDs ignoring syncs & autosyncs for delete operations
- Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)
- Speed up monitor failover - change default etcd_mon_ttl from 30 to 5 seconds
- Speed up operation retries - change default up_wait_retry_interval to 50 ms
- Add patch for libvirt 9.10
2024-02-04 02:23:49 +03:00
Vitaliy Filippov e026de95d5 Log to systemd by default 2024-02-04 01:21:31 +03:00
Vitaliy Filippov 77c10fd1f8 In fact, do not autosync blockstore when autosync_writes=0 2024-02-03 20:37:36 +03:00
Vitaliy Filippov 581d02e581 Mark secondary OSDs with deletions as dirty to not forget to sync & autosync them 2024-02-03 20:31:08 +03:00
Vitaliy Filippov f03a9db4d9 Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools) 2024-02-03 20:31:08 +03:00
Vitaliy Filippov cb9c30bc31 Sync after sending all deletes to each PG in cli rm-data 2024-02-03 20:31:08 +03:00
Vitaliy Filippov a86a380d20 Fix invalid parsing of autosync_writes in blockstore leading to autosyncs after every operation with disabled immediate_commit :D 2024-02-03 20:31:08 +03:00
Vitaliy Filippov d2b43cb118 Change default etcd_mon_ttl 2024-01-29 23:45:19 +03:00
Vitaliy Filippov cc76e6876b Fix flapping "scrub" test 2024-01-28 14:59:33 +03:00
Vitaliy Filippov 1cec62d25d Sync only completed writes
Should be a final remaining fix to EC + non-capacitor (non-immediate-commit) write hangs :).

First it was breaking non-EC ("instantly stable") writes because they sometimes
complete out of order which was leading to the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  BUG: Unexpected dirty_entry 1000000000001:29480000 v65540 unstable state during flush: 0x151

But it is easily fixed by scanning previous and next dirty_entries in mark_stable.
2024-01-27 15:17:22 +03:00
Vitaliy Filippov 1c322b33ed Change default up_wait_retry_interval to 50 ms 2024-01-26 01:51:08 +03:00
Vitaliy Filippov d27524f441 Add patch for libvirt 9.10 2024-01-25 01:09:12 +03:00
Vitaliy Filippov ba55f91409 Release 1.4.1
- Fix a monitor crash on primary OSD switching introduced in 1.4.0
- Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree
- Fix a realloc memory leak in theory possible with too large listings (OSD_OP_LIST)
2024-01-18 02:31:42 +03:00
Vitaliy Filippov 80aac39513 Add detailed formula for theoretical EC N+K random write performance 2024-01-18 00:36:32 +03:00
Vitaliy Filippov 2aa5aa7ab6 Add a test for simple master switching without PG reconfiguration
Also use osd_out_time:1 only in select tests and restart mon in tests only on connection errors
2024-01-17 00:19:01 +03:00
Vitaliy Filippov 3ca3b8a8d8 Fix recheck_pgs bug introduced in 1.4.0 2024-01-16 23:49:21 +03:00
Vitaliy Filippov 2cf649eba6 Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree 2024-01-15 03:04:33 +03:00
Vitaliy Filippov 5935640a4a Add CLA PR form 2024-01-14 16:48:24 +03:00
Vitaliy Filippov d00d4dbac0 Initialize mod_revision field in etcd_state_client 2024-01-13 01:30:28 +03:00
Vitaliy Filippov 5d9d6f32a0 Fix common realloc memory leak mistakes found by cppcheck 2024-01-13 01:30:28 +03:00