1
0
Fork 0
Commit Graph

1644 Commits (master)

Author SHA1 Message Date
Zibort Cloud f20e71b521 Add ubuntu jammy build 2024-02-22 11:03:06 +03:00
Vitaliy Filippov b3c15db331 32M journal by default in simple-offsets 2024-02-21 15:25:02 +03:00
Vitaliy Filippov 685bcd6ef9 Do not reserve extra space for big_writes during sync - sync itself is needed to commit and clear them 2024-02-21 13:00:14 +03:00
Vitaliy Filippov 3eb389b321 Supposed fix for "unexpected state during flush: 0x51" with EC 2024-02-21 01:32:06 +03:00
Vitaliy Filippov 3d16cde23c Fix assertions, add small sequential write test 2024-02-20 19:41:48 +03:00
Vitaliy Filippov c6406d67fc Fix journal space_check incorrectly checking for space at the beginning 2024-02-20 19:40:56 +03:00
Vitaliy Filippov f87964861d Release 1.4.6
Unwavering stabilization of 1.4.x, continued :-)

- Include the accidentally lost part of 1.4.5 journal trimming fix
- Fix a possible OSD crash with "BUG: Attempt to overwrite used offset"
  which was probably present for long time, but became apparent after
  fixing flapping tests in CI
- Fix remaining flapping tests in CI. It was the first time when tests
  actually passed without retries :-)
2024-02-20 17:01:26 +03:00
Vitaliy Filippov 62a4f45160 Raise test_scrub waiting timeout 2024-02-20 16:26:09 +03:00
Vitaliy Filippov 7048228678 Supposed fix for "BUG: Attempt to overwrite used offset" 2024-02-20 15:56:48 +03:00
Vitaliy Filippov ea73857450 Add asserts to catch "BUG: Attempt to overwrite used offset" 2024-02-20 15:56:48 +03:00
Vitaliy Filippov 6cfe38ec04 Followup to empty cur.oid as stop condition for forced trim fix 2024-02-20 15:56:38 +03:00
Vitaliy Filippov 7ae5766fdb Wait to clear has_degraded in test_heal - should fix flaps of test_heal_* in CI 2024-02-20 15:56:27 +03:00
Vitaliy Filippov f882c7dd87 Release 1.4.5
- Fix a write stall caused by incorrect journal trimming introduced in 1.4.4 :)
- Fix PGs sometimes hanging in "starting" state on mass OSD restarts
- Fix a rare crash with "map::at" during OSD pings
- Use new defaults for non-capacitor (desktop) SSDs - improves T1Q256 random write from ~6k iops to ~45k iops
- Make journal_trim_interval configurable
2024-02-16 10:13:33 +03:00
Vitaliy Filippov 26dd863c8d Fix sometimes possible crash on clients.at() during pings 2024-02-16 10:13:33 +03:00
Vitaliy Filippov 2ae859fbc6 Use min/max_flusher_count=32/256, 128M journal and autosync_writes=512 for non-capacitor SSDs by default 2024-02-16 10:13:33 +03:00
Vitaliy Filippov f6cd9f9153 Add a note about pg_minsize 2024-02-15 23:38:52 +03:00
Vitaliy Filippov 8389c0f33b Fix PGs sometimes hanging in "starting" state on mass OSD restarts 2024-02-15 23:38:52 +03:00
Vitaliy Filippov 9db2196aef Make journal_trim_interval configurable 2024-02-15 23:38:51 +03:00
Vitaliy Filippov 8d6ae662fe Use empty cur.oid as stop condition for forced trim, not journal_trim_counter 2024-02-15 23:27:17 +03:00
Vitaliy Filippov c777a0041a Release 1.4.4
A couple of fixes for EC pools

- Fix a segfault possible on partial EC overwrite in 1234 -> 5030 rebalance scenario
- Fix two problems leading to EC pools stalling on rebalance & parallel sudden stops
  of OSDs, for example during a sudden poweroff of a host:
  - Recovery auto-tuning (1.4.0 feature) could apply too large delays and stall
    the EC journal - fixed by limiting delays with a new recovery_tune_sleep_cutoff_us
    parameter (10 seconds by default) and applying recovery pauses before write
    operations, not after them, to not occupy space in the journal for long time
  - Dynamic journal space reservation (1.3.0 feature) wasn't accounting new writes
    when checking the limit so OSDs could still fill the journal fully and stall -
    fixed by including new writes into the limit
- Print etcd dbSize instead of dbSizeInUse in status
2024-02-11 16:23:08 +03:00
Vitaliy Filippov 2947ea93e8 Raise test_snapshot_chain_ec timeout to 6 minutes 2024-02-11 16:13:52 +03:00
Vitaliy Filippov 978bdc128a Apply recovery pause before writes, after commits, and do not apply it to syncs to not block EC pools from functioning 2024-02-11 16:13:52 +03:00
Vitaliy Filippov bb2f395f1e Add cutoff threshold for recovery auto-tuning 2024-02-11 16:13:52 +03:00
Vitaliy Filippov b127da40f7 Add a FIXME about incomplete PGs 2024-02-11 13:42:51 +03:00
Vitaliy Filippov ca34a6047a Fix dynamic journal space reservation: include the new write itself, too 2024-02-11 13:42:51 +03:00
Vitaliy Filippov 38ba76e893 Fix flusher sometimes being unable to trim journal when the flush queue is empty 2024-02-11 13:42:51 +03:00
Vitaliy Filippov 1e3c4edea0 Print etcd dbSize instead of dbSizeInUse in status 2024-02-11 13:42:51 +03:00
Vitaliy Filippov e7ac855b07 Fix that EC segfault (1234 -> 5030 partial overwrite) 2024-02-11 13:42:51 +03:00
Vitaliy Filippov c53357ac45 Add a test for EC segfault with partial overwrite in 1234 -> 5030 rebalance scenario 2024-02-11 13:42:51 +03:00
Vitaliy Filippov 27e9f244ec Release 1.4.3
Hotfix for hotfix O:-)

- "Write stall fix" was incomplete and EC write stalls could
  continue even on 1.4.2. Now they're finally fixed O:-)
- Make monitor ignore statistics of stopped OSDs. Previously if you stopped all
  OSDs the last total I/O numbers would remain the same indefinitely
2024-02-09 00:29:31 +03:00
Vitaliy Filippov 8e25a28a08 Ignore down OSDs in monitor statistics aggregation 2024-02-09 00:22:36 +03:00
Vitaliy Filippov 5d3317e4f2 Followup to 1.4.2 write stall fix - sadly, the previous version was not working correctly :) 2024-02-08 19:34:29 +03:00
Vitaliy Filippov 016115c0d4 Release 1.4.2
- Log to systemd by default
- Fix excessive autosyncs after every operation with disabled immediate_commit (introduced in 1.1.0)
- Fix a possible write stall with EC due to the lack of OSD wakeup after stabilizing previous writes
- Change sync operation semantics as a final fix to possible write stalls with EC and disabled immediate_commit
- Sync after deleting data in CLI rm / rm-data if immediate_commit is disabled
- Fix OSDs ignoring syncs & autosyncs for delete operations
- Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)
- Speed up monitor failover - change default etcd_mon_ttl from 30 to 5 seconds
- Speed up operation retries - change default up_wait_retry_interval to 50 ms
- Add patch for libvirt 9.10
2024-02-04 02:23:49 +03:00
Vitaliy Filippov e026de95d5 Log to systemd by default 2024-02-04 01:21:31 +03:00
Vitaliy Filippov 77c10fd1f8 In fact, do not autosync blockstore when autosync_writes=0 2024-02-03 20:37:36 +03:00
Vitaliy Filippov 581d02e581 Mark secondary OSDs with deletions as dirty to not forget to sync & autosync them 2024-02-03 20:31:08 +03:00
Vitaliy Filippov f03a9db4d9 Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools) 2024-02-03 20:31:08 +03:00
Vitaliy Filippov cb9c30bc31 Sync after sending all deletes to each PG in cli rm-data 2024-02-03 20:31:08 +03:00
Vitaliy Filippov a86a380d20 Fix invalid parsing of autosync_writes in blockstore leading to autosyncs after every operation with disabled immediate_commit :D 2024-02-03 20:31:08 +03:00
Vitaliy Filippov d2b43cb118 Change default etcd_mon_ttl 2024-01-29 23:45:19 +03:00
Vitaliy Filippov cc76e6876b Fix flapping "scrub" test 2024-01-28 14:59:33 +03:00
Vitaliy Filippov 1cec62d25d Sync only completed writes
Should be a final remaining fix to EC + non-capacitor (non-immediate-commit) write hangs :).

First it was breaking non-EC ("instantly stable") writes because they sometimes
complete out of order which was leading to the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  BUG: Unexpected dirty_entry 1000000000001:29480000 v65540 unstable state during flush: 0x151

But it is easily fixed by scanning previous and next dirty_entries in mark_stable.
2024-01-27 15:17:22 +03:00
Vitaliy Filippov 1c322b33ed Change default up_wait_retry_interval to 50 ms 2024-01-26 01:51:08 +03:00
Vitaliy Filippov d27524f441 Add patch for libvirt 9.10 2024-01-25 01:09:12 +03:00
Vitaliy Filippov ba55f91409 Release 1.4.1
- Fix a monitor crash on primary OSD switching introduced in 1.4.0
- Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree
- Fix a realloc memory leak in theory possible with too large listings (OSD_OP_LIST)
2024-01-18 02:31:42 +03:00
Vitaliy Filippov 80aac39513 Add detailed formula for theoretical EC N+K random write performance 2024-01-18 00:36:32 +03:00
Vitaliy Filippov 2aa5aa7ab6 Add a test for simple master switching without PG reconfiguration
Also use osd_out_time:1 only in select tests and restart mon in tests only on connection errors
2024-01-17 00:19:01 +03:00
Vitaliy Filippov 3ca3b8a8d8 Fix recheck_pgs bug introduced in 1.4.0 2024-01-16 23:49:21 +03:00
Vitaliy Filippov 2cf649eba6 Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree 2024-01-15 03:04:33 +03:00
Vitaliy Filippov 5935640a4a Add CLA PR form 2024-01-14 16:48:24 +03:00