Vitaliy Filippov
d2b43cb118
Change default etcd_mon_ttl
2024-01-29 23:45:19 +03:00
Vitaliy Filippov
1c322b33ed
Change default up_wait_retry_interval to 50 ms
2024-01-26 01:51:08 +03:00
Vitaliy Filippov
2aa5aa7ab6
Add a test for simple master switching without PG reconfiguration
...
Also use osd_out_time:1 only in select tests and restart mon in tests only on connection errors
2024-01-17 00:19:01 +03:00
Vitaliy Filippov
3ca3b8a8d8
Fix recheck_pgs bug introduced in 1.4.0
2024-01-16 23:49:21 +03:00
Vitaliy Filippov
99ee8596ea
Rename min/max_util to util_low/high
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
f757a35a8d
Retry PG changes without re-running lpsolve when pool configuration and OSD tree don't change
...
OSDs often change their /pg/history keys during rebalance, so monitor receives additional
transaction failures from etcd if it re-runs lpsolve which sometimes may even lead to monitor
being unable to apply PG changes at all until rebalance completes
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
1edf86ed26
Aggregate recovery delay using simple mean over last 10 observations (EWMA is shit)
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
751935ddd8
WIP Auto-tune recovery speed
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
1299373988
Use the same etcd_ws_keepalive_interval in OSD and mon
2023-12-23 20:07:29 +03:00
Vitaliy Filippov
4ece4dfdd0
Fix mon not using values from config when /config/global is not present
2023-12-22 02:25:09 +03:00
Vitaliy Filippov
7da4868b37
Fix monitor statistics aggregation in case of empty /osd/stats keys
2023-11-24 01:05:21 +03:00
Vitaliy Filippov
0e888e6c60
Prevent spamming etcd with last_clean_pgs update requests
2023-11-05 00:12:00 +03:00
Vitaliy Filippov
408c21d8f0
Scale last_clean_pgs PG count even if current PGs already contain the new number of PGs
2023-11-04 23:45:59 +03:00
Vitaliy Filippov
43cb9ae212
Prevent multiple parallel recheck_pgs in case of timeouts
2023-11-04 20:59:56 +03:00
Vitaliy Filippov
2e592a2f22
Fix undefined variable "timeout"
2023-10-29 01:30:55 +03:00
Vitaliy Filippov
b92f644e3a
Fix statistics aggregation, calculate inode stats by first deriving per-OSD stats, too
2023-10-29 01:30:55 +03:00
Vitaliy Filippov
be7e76f849
Split etcd_stats_interval out of etcd_report_interval
2023-10-27 01:26:26 +03:00
Vitaliy Filippov
38db53f5ee
Implement client writeback cache
...
- Disabled by default, enable with client_enable_writeback=true
- Even then only enabled in FIO when -direct is disabled and in QEMU when
block device cache is enabled in settings
- Can also be enabled in other clients like vitastor-cli using parameter
client_writeback_allowed=true, but not recommended
2023-09-16 17:52:17 +03:00
Vitaliy Filippov
ff479a102d
Make MON filter OSDs by block layout to prevent "refusing to start PGs of this pool" errors on misconfiguration
2023-09-16 17:52:17 +03:00
Vitaliy Filippov
ab8627c9fa
Fix monitor retrying failed etcd connection in an infinite loop without pauses
2023-08-09 00:57:08 +03:00
Vitaliy Filippov
25a15d24cf
Fix incorrect EC space statistics in `vitastor-cli status`
2023-07-27 02:26:17 +00:00
Vitaliy Filippov
d007a374f2
Delete extra /pool/stats/ keys for non-existing pools
2023-07-06 00:40:13 +03:00
Vitaliy Filippov
a4186e20aa
First derive, then sum per-OSD statistics instead of first summing and then deriving
...
This makes statistics reported by vitastor-cli status much smoother
2023-06-18 01:32:24 +03:00
Vitaliy Filippov
aea567cfbd
Slightly improve scrub docs
2023-05-21 12:52:30 +03:00
Vitaliy Filippov
ce02f47de6
Allow to disable scrub_find_best
2023-05-21 12:33:38 +03:00
Vitaliy Filippov
8d40ad99a6
Add scrub documentation
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
3475772b07
Add configuration online update documentation
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
6648f6bb6e
Implement ambiguity detection during scrub
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
3c924397e7
Store next scrub timestamp instead of last scrub timestamp
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
c3bd26193d
Implement PG scrub runner
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
0538a484b3
Add corrupted object state
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
022176aa98
Fix NaN during PG optimisation if there are nonexisting OSDs in node_placement
2023-05-17 01:20:30 +03:00
Vitaliy Filippov
120e3fa7bc
Fix pool deletion
2023-05-17 00:45:59 +03:00
Vitaliy Filippov
6f4dc16c59
Handle etcd connection errors correctly in mon (unhandled error events)
2023-05-11 11:02:44 +03:00
Vitaliy Filippov
321cb435a6
Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number
2023-05-08 20:39:20 +03:00
Vitaliy Filippov
5b9031fecc
Fix monitor possibly applying incorrect PG history under heavy load
...
Monitor could deceive itself by immediately saving PG configuration changes
which weren't applied to etcd yet in memory, and apply incorrect PG history
changes next time if the first update fails.
This usually only happened under heavy load and was caught in CI. :-)
2023-05-07 23:23:00 +03:00
Vitaliy Filippov
d06ed2b0e7
Implement online config update
2023-03-26 19:21:50 +03:00
Vitaliy Filippov
14d6acbcba
Set default rdma_max_recv/send to 16/8, fix documentation
2023-02-28 11:00:56 +03:00
Vitaliy Filippov
c3e80abad7
Allow to send more than 1 operation at a time
2023-02-26 02:01:04 +03:00
Vitaliy Filippov
2c8241b7db
Remove PG "peered" state
2023-02-21 01:30:42 +03:00
Vitaliy Filippov
0f6b946add
Time changes with every stat change, do not schedule checks based on it
2023-01-05 13:54:16 +03:00
Vitaliy Filippov
465cbf0b2f
Do not re-schedule recheck indefinitely, run it after mon_change_timeout in any case
2023-01-05 13:48:06 +03:00
Vitaliy Filippov
41add50e4e
Track last_clean_pgs on a per-pool basis
2023-01-03 02:20:50 +03:00
Vitaliy Filippov
3de57e87b1
Recheck OSD tree in monitor on /osd/stats changes
2022-12-26 02:48:48 +03:00
Vitaliy Filippov
2f13f347b0
Fix space stats in mon
2022-09-03 11:16:33 +03:00
Vitaliy Filippov
5a10d135f3
Allow to configure block_size, bitmap_granularity and immediate_commit per-pool
2022-08-11 01:56:33 +03:00
Vitaliy Filippov
36e851505a
Make monitor delete pool statistics when the pool is deleted
2022-06-04 13:27:06 +03:00
Vitaliy Filippov
1efbbb0c36
Make deleted inodes vanish from statistics after 60 seconds
2022-06-04 13:27:06 +03:00
Vitaliy Filippov
a0cae4c180
Rename "jerasure" to "ec" in pool configuration, function names, fix documentation and Debian build scripts
...
Old pool configurations with "jerasure" also remain supported as an alias for "ec"
2022-06-03 15:40:00 +03:00
Vitaliy Filippov
cf03b9c84d
Implement "primary affinity tags"
2022-05-09 22:37:23 +03:00