vitastor

Commit Graph

Author	SHA1	Message	Date
Vitaliy Filippov	2947ea93e8	Raise test_snapshot_chain_ec timeout to 6 minutes	2024-02-11 16:13:52 +03:00
Vitaliy Filippov	978bdc128a	Apply recovery pause before writes, after commits, and do not apply it to syncs to not block EC pools from functioning	2024-02-11 16:13:52 +03:00
Vitaliy Filippov	bb2f395f1e	Add cutoff threshold for recovery auto-tuning	2024-02-11 16:13:52 +03:00
Vitaliy Filippov	b127da40f7	Add a FIXME about incomplete PGs	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	ca34a6047a	Fix dynamic journal space reservation: include the new write itself, too	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	38ba76e893	Fix flusher sometimes being unable to trim journal when the flush queue is empty	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	1e3c4edea0	Print etcd dbSize instead of dbSizeInUse in status	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	e7ac855b07	Fix that EC segfault (1234 -> 5030 partial overwrite)	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	c53357ac45	Add a test for EC segfault with partial overwrite in 1234 -> 5030 rebalance scenario	2024-02-11 13:42:51 +03:00
Vitaliy Filippov	27e9f244ec	Release 1.4.3 Hotfix for hotfix O:-) - "Write stall fix" was incomplete and EC write stalls could continue even on 1.4.2. Now they're finally fixed O:-) - Make monitor ignore statistics of stopped OSDs. Previously if you stopped all OSDs the last total I/O numbers would remain the same indefinitely	2024-02-09 00:29:31 +03:00
Vitaliy Filippov	8e25a28a08	Ignore down OSDs in monitor statistics aggregation	2024-02-09 00:22:36 +03:00
Vitaliy Filippov	5d3317e4f2	Followup to 1.4.2 write stall fix - sadly, the previous version was not working correctly :)	2024-02-08 19:34:29 +03:00
Vitaliy Filippov	016115c0d4	Release 1.4.2 - Log to systemd by default - Fix excessive autosyncs after every operation with disabled immediate_commit (introduced in 1.1.0) - Fix a possible write stall with EC due to the lack of OSD wakeup after stabilizing previous writes - Change sync operation semantics as a final fix to possible write stalls with EC and disabled immediate_commit - Sync after deleting data in CLI rm / rm-data if immediate_commit is disabled - Fix OSDs ignoring syncs & autosyncs for delete operations - Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools) - Speed up monitor failover - change default etcd_mon_ttl from 30 to 5 seconds - Speed up operation retries - change default up_wait_retry_interval to 50 ms - Add patch for libvirt 9.10	2024-02-04 02:23:49 +03:00
Vitaliy Filippov	e026de95d5	Log to systemd by default	2024-02-04 01:21:31 +03:00
Vitaliy Filippov	77c10fd1f8	In fact, do not autosync blockstore when autosync_writes=0	2024-02-03 20:37:36 +03:00
Vitaliy Filippov	581d02e581	Mark secondary OSDs with deletions as dirty to not forget to sync & autosync them	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	f03a9db4d9	Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	cb9c30bc31	Sync after sending all deletes to each PG in cli rm-data	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	a86a380d20	Fix invalid parsing of autosync_writes in blockstore leading to autosyncs after every operation with disabled immediate_commit :D	2024-02-03 20:31:08 +03:00
Vitaliy Filippov	d2b43cb118	Change default etcd_mon_ttl	2024-01-29 23:45:19 +03:00
Vitaliy Filippov	cc76e6876b	Fix flapping "scrub" test	2024-01-28 14:59:33 +03:00
Vitaliy Filippov	1cec62d25d	Sync only completed writes Should be a final remaining fix to EC + non-capacitor (non-immediate-commit) write hangs :). First it was breaking non-EC ("instantly stable") writes because they sometimes complete out of order which was leading to the following error: terminate called after throwing an instance of 'std::runtime_error' what(): BUG: Unexpected dirty_entry 1000000000001:29480000 v65540 unstable state during flush: 0x151 But it is easily fixed by scanning previous and next dirty_entries in mark_stable.	2024-01-27 15:17:22 +03:00
Vitaliy Filippov	1c322b33ed	Change default up_wait_retry_interval to 50 ms	2024-01-26 01:51:08 +03:00
Vitaliy Filippov	d27524f441	Add patch for libvirt 9.10	2024-01-25 01:09:12 +03:00
Vitaliy Filippov	ba55f91409	Release 1.4.1 - Fix a monitor crash on primary OSD switching introduced in 1.4.0 - Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree - Fix a realloc memory leak in theory possible with too large listings (OSD_OP_LIST)	2024-01-18 02:31:42 +03:00
Vitaliy Filippov	80aac39513	Add detailed formula for theoretical EC N+K random write performance	2024-01-18 00:36:32 +03:00
Vitaliy Filippov	2aa5aa7ab6	Add a test for simple master switching without PG reconfiguration Also use osd_out_time:1 only in select tests and restart mon in tests only on connection errors	2024-01-17 00:19:01 +03:00
Vitaliy Filippov	3ca3b8a8d8	Fix recheck_pgs bug introduced in 1.4.0	2024-01-16 23:49:21 +03:00
Vitaliy Filippov	2cf649eba6	Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree	2024-01-15 03:04:33 +03:00
Vitaliy Filippov	5935640a4a	Add CLA PR form	2024-01-14 16:48:24 +03:00
Vitaliy Filippov	d00d4dbac0	Initialize mod_revision field in etcd_state_client	2024-01-13 01:30:28 +03:00
Vitaliy Filippov	5d9d6f32a0	Fix common realloc memory leak mistakes found by cppcheck	2024-01-13 01:30:28 +03:00
Vitaliy Filippov	5280d1d561	Release 1.4.0 New features: - Intelligent recovery/rebalance speed auto-tuning to reduce its impact on clients (see README -> Features) - Auto-restoration of dead VDUSE daemons in CSI plugin - Add vitastor-disk update-sb command - Update QEMU for Debian Bookworm to 8.1 and use it for CSI plugin Bug fixes: - Fix pools SOMETIMES staying inactive after stopping a node due to OSDs not reacting to PG state changes caused by incorrect full reload of state from etcd on reconnection - Make monitors retry pool configuration changes quickier which fixes them being unable to apply changes when an ongoing rebalance is quickly making a lot of PGs clean - Fix CSI plugin not accepting array of strings as etcd address in /etc/vitastor/vitastor.conf - Allow multiple interfaces with the same IP address, for "simple routed" full mesh network - Do not ignore loopback addresses for OSD network (to make ECMP setups with frr possible) - Fix a rare client crash during OSD reconnections - Only treat data partitions as existing OSDs in vitastor-disk prepare - Remove etcd parameter from default command examples - Fix reported free space sometimes changing non-immediately after deletion of data from OSDs - Fix a possible OSD crash on print_slow when bs_op is NULL - Use the same etcd_ws_keepalive_interval in mon as in OSD - Fix mon not using values from config when /config/global is not present - Remove pve-storage-portal-dns-list format for vitastor_etcd_address - Parse log_level in cluster_client - Fix vitastor-nbd image existence check not working because of non-zeroed inode_watch fields - Do not warn on EPIPE in client unless log_level is raised explicitly - Fix incorrect error in CSI when searching for the device in /sys - Remove 2 last prints to stdout in etcd_state_client - Fix a possible OSD crash when checking corrupted journal entries	2024-01-12 01:28:33 +03:00
Vitaliy Filippov	317b0feb0a	Add a note about VDUSE daemon auto-restart	2024-01-12 01:27:36 +03:00
Vitaliy Filippov	247f0552db	Fix debug log "killing..." in CSI	2024-01-10 01:19:34 +03:00
Vitaliy Filippov	2f228fa96a	Only treat data partitions as existing OSDs in vitastor-disk prepare	2023-12-31 11:46:47 +03:00
Vitaliy Filippov	2f6b9c0306	Remove etcd parameter from default command examples	2023-12-31 02:50:41 +03:00
Vitaliy Filippov	48b5f871e0	Add Contributor License Aggrement in Russian and English	2023-12-31 01:23:52 +03:00
Vitaliy Filippov	c17f76a3e4	Add documentation for recovery auto-tuning	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	a6ab54b1ba	Do not allow negative util_low/high	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	99ee8596ea	Rename min/max_util to util_low/high	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	c4928e6ecd	Protect from try_send completing the operation immediately Fixes a possible use-after-free in case of continue_ops() calling try_send(), then connect_peer() -> set_timer() -> trigger_nearest() -> handle_op_part() -> continue_ops() again	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	ec7dcd1be5	Do not apply very large recovery pauses during tests	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	e600bbc151	Fix flapping move_reappear test by adding an fsync before stopping PG	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	8b8c1179a7	Use a separate used_blocks counter for free space stats to hide possibly delayed on-flush deallocation	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	d5a6fa6dd7	Fix possible crash on print_slow when bs_op is NULL	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	f757a35a8d	Retry PG changes without re-running lpsolve when pool configuration and OSD tree don't change OSDs often change their /pg/history keys during rebalance, so monitor receives additional transaction failures from etcd if it re-runs lpsolve which sometimes may even lead to monitor being unable to apply PG changes at all until rebalance completes	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	1edf86ed26	Aggregate recovery delay using simple mean over last 10 observations (EWMA is shit)	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	5ca7cde612	Experiment/WIP: Try to track "secondary" recovery ops separately	2023-12-31 01:23:17 +03:00
Vitaliy Filippov	751935ddd8	WIP Auto-tune recovery speed	2023-12-31 01:23:17 +03:00

1 2 3 4 5 ...

1624 Commits (2947ea93e851c0af74592c4ce36f7b9444f1746d) All Branches Search

1624 Commits (2947ea93e851c0af74592c4ce36f7b9444f1746d)

All Branches