Vitaliy Filippov
2f228fa96a
Only treat data partitions as existing OSDs in vitastor-disk prepare
2023-12-31 11:46:47 +03:00
Vitaliy Filippov
2f6b9c0306
Remove etcd parameter from default command examples
2023-12-31 02:50:41 +03:00
Vitaliy Filippov
48b5f871e0
Add Contributor License Aggrement in Russian and English
2023-12-31 01:23:52 +03:00
Vitaliy Filippov
c17f76a3e4
Add documentation for recovery auto-tuning
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
a6ab54b1ba
Do not allow negative util_low/high
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
99ee8596ea
Rename min/max_util to util_low/high
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
c4928e6ecd
Protect from try_send completing the operation immediately
...
Fixes a possible use-after-free in case of continue_ops() calling try_send(),
then connect_peer() -> set_timer() -> trigger_nearest() -> handle_op_part() -> continue_ops() again
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
ec7dcd1be5
Do not apply very large recovery pauses during tests
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
e600bbc151
Fix flapping move_reappear test by adding an fsync before stopping PG
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
8b8c1179a7
Use a separate used_blocks counter for free space stats to hide possibly delayed on-flush deallocation
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
d5a6fa6dd7
Fix possible crash on print_slow when bs_op is NULL
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
f757a35a8d
Retry PG changes without re-running lpsolve when pool configuration and OSD tree don't change
...
OSDs often change their /pg/history keys during rebalance, so monitor receives additional
transaction failures from etcd if it re-runs lpsolve which sometimes may even lead to monitor
being unable to apply PG changes at all until rebalance completes
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
1edf86ed26
Aggregate recovery delay using simple mean over last 10 observations (EWMA is shit)
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
5ca7cde612
Experiment/WIP: Try to track "secondary" recovery ops separately
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
751935ddd8
WIP Auto-tune recovery speed
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
d84dee7098
Track recovery op latencies + refactor into a structure
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
dcc76eee15
Add a parity chunk count change test script
2023-12-26 23:48:41 +03:00
Vitaliy Filippov
2f38adeb3d
Restart dead VDUSE daemons at regular intervals
2023-12-24 12:58:50 +03:00
Vitaliy Filippov
f72f14e6a7
Clear old PG states, history, and OSD states on etcd state reload
...
Also add protection from etcd watcher messages being split into multiple websocket
messages - I'm not sure if etcd actually does that, but it's better to have extra
protection anyway.
Also check that all etcd watchers are started in the keepalive routine, otherwise
it sometimes tries to revive etcd watchers starting with revision=1 which obviously
always fails because this revision is nearly always compacted.
All these changes should fix an old rarely reproduced bug where SOMETIMES OSDs
didn't react to PG config changes which was leading to offline pools on node reboot.
It happened on the full reload of state from etcd.
2023-12-24 02:02:13 +03:00
Vitaliy Filippov
1299373988
Use the same etcd_ws_keepalive_interval in OSD and mon
2023-12-23 20:07:29 +03:00
Vitaliy Filippov
178bb0e701
Prevent re-entry into timerfd set_nearest
2023-12-22 02:32:40 +03:00
Vitaliy Filippov
4ece4dfdd0
Fix mon not using values from config when /config/global is not present
2023-12-22 02:25:09 +03:00
Vitaliy Filippov
95631773b6
Remove pve-storage-portal-dns-list format for vitastor_etcd_address
2023-12-20 02:22:06 +03:00
Vitaliy Filippov
7239cfb91a
Parse log_level in cluster_client
2023-12-20 02:21:23 +03:00
Vitaliy Filippov
7cea642f4a
Fix vitastor-nbd image existence check not working because of non-zeroed inode_watch fields
2023-12-19 01:11:37 +03:00
Vitaliy Filippov
dc615403d9
Do not warn on EPIPE in client unless log_level is raised explicitly
2023-12-17 13:42:26 +03:00
Vitaliy Filippov
1a704e06ab
Allow multiple interfaces with the same IP address, for "simple routed" full mesh network
2023-12-17 13:25:56 +03:00
Vitaliy Filippov
575475de71
Do not ignore loopback addresses for OSD network (to make ECMP setups with frr possible)
2023-12-17 11:55:13 +03:00