master #3

Merged

antilles merged 33 commits from vitalif/vitastor:master into master

2024-02-13 14:44:09 +03:00

Author	SHA1	Message	Date
Vitaliy Filippov	c777a0041a	Release 1.4.4 A couple of fixes for EC pools - Fix a segfault possible on partial EC overwrite in 1234 -> 5030 rebalance scenario - Fix two problems leading to EC pools stalling on rebalance & parallel sudden stops of OSDs, for example during a sudden poweroff of a host: - Recovery auto-tuning (1.4.0 feature) could apply too large delays and stall the EC journal - fixed by limiting delays with a new recovery_tune_sleep_cutoff_us parameter (10 seconds by default) and applying recovery pauses before write operations, not after them, to not occupy space in the journal for long time - Dynamic journal space reservation (1.3.0 feature) wasn't accounting new writes when checking the limit so OSDs could still fill the journal fully and stall - fixed by including new writes into the limit - Print etcd dbSize instead of dbSizeInUse in status	2024-02-11 16:23:08 +03:00
Vitaliy Filippov	2947ea93e8	Raise test_snapshot_chain_ec timeout to 6 minutes	2024-02-11 16:13:52 +03:00
Vitaliy Filippov	978bdc128a	Apply recovery pause before writes, after commits, and do not apply it to syncs to not block EC pools from functioning	2024-02-11 16:13:52 +03:00
Vitaliy Filippov	bb2f395f1e	Add cutoff threshold for recovery auto-tuning	2024-02-11 16:13:52 +03:00
Vitaliy Filippov	b127da40f7	Add a FIXME about incomplete PGs	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	ca34a6047a	Fix dynamic journal space reservation: include the new write itself, too	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	38ba76e893	Fix flusher sometimes being unable to trim journal when the flush queue is empty	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	1e3c4edea0	Print etcd dbSize instead of dbSizeInUse in status	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	e7ac855b07	Fix that EC segfault (1234 -> 5030 partial overwrite)	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	c53357ac45	Add a test for EC segfault with partial overwrite in 1234 -> 5030 rebalance scenario	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	27e9f244ec	Release 1.4.3 Hotfix for hotfix O:-) - "Write stall fix" was incomplete and EC write stalls could continue even on 1.4.2. Now they're finally fixed O:-) - Make monitor ignore statistics of stopped OSDs. Previously if you stopped all OSDs the last total I/O numbers would remain the same indefinitely	2024-02-09 00:29:31 +03:00
Vitaliy Filippov	8e25a28a08	Ignore down OSDs in monitor statistics aggregation	2024-02-09 00:22:36 +03:00
Vitaliy Filippov	5d3317e4f2	Followup to 1.4.2 write stall fix - sadly, the previous version was not working correctly :)	2024-02-08 19:34:29 +03:00
Vitaliy Filippov	016115c0d4	Release 1.4.2 - Log to systemd by default - Fix excessive autosyncs after every operation with disabled immediate_commit (introduced in 1.1.0) - Fix a possible write stall with EC due to the lack of OSD wakeup after stabilizing previous writes - Change sync operation semantics as a final fix to possible write stalls with EC and disabled immediate_commit - Sync after deleting data in CLI rm / rm-data if immediate_commit is disabled - Fix OSDs ignoring syncs & autosyncs for delete operations - Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools) - Speed up monitor failover - change default etcd_mon_ttl from 30 to 5 seconds - Speed up operation retries - change default up_wait_retry_interval to 50 ms - Add patch for libvirt 9.10	2024-02-04 02:23:49 +03:00
Vitaliy Filippov	e026de95d5	Log to systemd by default	2024-02-04 01:21:31 +03:00
Vitaliy Filippov	77c10fd1f8	In fact, do not autosync blockstore when autosync_writes=0	2024-02-03 20:37:36 +03:00
Vitaliy Filippov	581d02e581	Mark secondary OSDs with deletions as dirty to not forget to sync & autosync them	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	f03a9db4d9	Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	cb9c30bc31	Sync after sending all deletes to each PG in cli rm-data	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	a86a380d20	Fix invalid parsing of autosync_writes in blockstore leading to autosyncs after every operation with disabled immediate_commit :D	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	d2b43cb118	Change default etcd_mon_ttl	2024-01-29 23:45:19 +03:00
Vitaliy Filippov	cc76e6876b	Fix flapping "scrub" test	2024-01-28 14:59:33 +03:00
Vitaliy Filippov	1cec62d25d	Sync only completed writes Should be a final remaining fix to EC + non-capacitor (non-immediate-commit) write hangs :). First it was breaking non-EC ("instantly stable") writes because they sometimes complete out of order which was leading to the following error: terminate called after throwing an instance of 'std::runtime_error' what(): BUG: Unexpected dirty_entry 1000000000001:29480000 v65540 unstable state during flush: 0x151 But it is easily fixed by scanning previous and next dirty_entries in mark_stable.	2024-01-27 15:17:22 +03:00
Vitaliy Filippov	1c322b33ed	Change default up_wait_retry_interval to 50 ms	2024-01-26 01:51:08 +03:00
Vitaliy Filippov	d27524f441	Add patch for libvirt 9.10	2024-01-25 01:09:12 +03:00
Vitaliy Filippov	ba55f91409	Release 1.4.1 - Fix a monitor crash on primary OSD switching introduced in 1.4.0 - Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree - Fix a realloc memory leak in theory possible with too large listings (OSD_OP_LIST)	2024-01-18 02:31:42 +03:00
Vitaliy Filippov	80aac39513	Add detailed formula for theoretical EC N+K random write performance	2024-01-18 00:36:32 +03:00
Vitaliy Filippov	2aa5aa7ab6	Add a test for simple master switching without PG reconfiguration Also use osd_out_time:1 only in select tests and restart mon in tests only on connection errors	2024-01-17 00:19:01 +03:00
Vitaliy Filippov	3ca3b8a8d8	Fix recheck_pgs bug introduced in 1.4.0	2024-01-16 23:49:21 +03:00
Vitaliy Filippov	2cf649eba6	Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree	2024-01-15 03:04:33 +03:00
Vitaliy Filippov	5935640a4a	Add CLA PR form	2024-01-14 16:48:24 +03:00
Vitaliy Filippov	d00d4dbac0	Initialize mod_revision field in etcd_state_client	2024-01-13 01:30:28 +03:00
Vitaliy Filippov	5d9d6f32a0	Fix common realloc memory leak mistakes found by cppcheck	2024-01-13 01:30:28 +03:00

master #3

33 Commits