Commit Graph

98 Commits (ff479a102dd340d109dd2c893d65cde3fb16039b)

Author SHA1 Message Date
Vitaliy Filippov ff479a102d Make MON filter OSDs by block layout to prevent "refusing to start PGs of this pool" errors on misconfiguration 2023-09-16 17:52:17 +03:00
Vitaliy Filippov ab8627c9fa Fix monitor retrying failed etcd connection in an infinite loop without pauses 2023-08-09 00:57:08 +03:00
Vitaliy Filippov 25a15d24cf Fix incorrect EC space statistics in `vitastor-cli status`
Test / test_etcd_fail (push) Successful in 1m21s Details
Test / test_interrupted_rebalance_imm (push) Successful in 2m9s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m52s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m25s Details
Test / test_failure_domain (push) Successful in 10s Details
Test / test_snapshot (push) Successful in 28s Details
Test / test_snapshot_ec (push) Successful in 30s Details
Test / test_minsize_1 (push) Successful in 15s Details
Test / test_move_reappear (push) Successful in 17s Details
Test / test_rm (push) Successful in 11s Details
Test / test_snapshot_chain (push) Successful in 2m1s Details
Test / test_snapshot_chain_ec (push) Successful in 2m41s Details
Test / test_snapshot_down (push) Successful in 23s Details
Test / test_snapshot_down_ec (push) Successful in 24s Details
Test / test_splitbrain (push) Successful in 17s Details
Test / test_rebalance_verify (push) Successful in 3m9s Details
Test / test_rebalance_verify_imm (push) Successful in 3m9s Details
Test / test_rebalance_verify_ec (push) Successful in 3m23s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 5m38s Details
Test / test_write (push) Successful in 33s Details
Test / test_write_xor (push) Successful in 43s Details
Test / test_write_no_same (push) Successful in 14s Details
Test / test_heal_pg_size_2 (push) Successful in 4m16s Details
Test / test_heal_ec (push) Successful in 5m0s Details
Test / test_scrub (push) Successful in 56s Details
Test / test_scrub_zero_osd_2 (push) Successful in 41s Details
Test / test_scrub_xor (push) Successful in 32s Details
Test / test_scrub_pg_size_3 (push) Successful in 53s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 45s Details
Test / test_scrub_ec (push) Successful in 40s Details
2023-07-27 02:26:17 +00:00
Vitaliy Filippov d007a374f2 Delete extra /pool/stats/ keys for non-existing pools
Test / test_interrupted_rebalance (push) Failing after 10m5s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m29s Details
Test / test_interrupted_rebalance_ec (push) Failing after 10m7s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m32s Details
Test / test_failure_domain (push) Successful in 8s Details
Test / test_snapshot (push) Successful in 19s Details
Test / test_snapshot_ec (push) Successful in 19s Details
Test / test_minsize_1 (push) Successful in 12s Details
Test / test_move_reappear (push) Successful in 17s Details
Test / test_rm (push) Successful in 11s Details
Test / test_snapshot_chain (push) Successful in 1m1s Details
Test / test_snapshot_chain_ec (push) Successful in 1m25s Details
Test / test_snapshot_down (push) Successful in 20s Details
Test / test_snapshot_down_ec (push) Successful in 19s Details
Test / test_splitbrain (push) Successful in 12s Details
Test / test_rebalance_verify (push) Successful in 3m1s Details
Test / test_rebalance_verify_imm (push) Successful in 4m11s Details
Test / test_rebalance_verify_ec (push) Successful in 4m19s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m51s Details
Test / test_write (push) Successful in 31s Details
Test / test_write_xor (push) Successful in 41s Details
Test / test_write_no_same (push) Successful in 12s Details
Test / test_heal_pg_size_2 (push) Successful in 4m10s Details
Test / test_heal_ec (push) Failing after 10m11s Details
Test / test_scrub (push) Successful in 43s Details
Test / test_scrub_zero_osd_2 (push) Successful in 36s Details
Test / test_scrub_xor (push) Successful in 37s Details
Test / test_scrub_pg_size_3 (push) Successful in 48s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 23s Details
Test / test_scrub_ec (push) Successful in 22s Details
2023-07-06 00:40:13 +03:00
Vitaliy Filippov a4186e20aa First derive, then sum per-OSD statistics instead of first summing and then deriving
Test / test_change_pg_count (push) Successful in 43s Details
Test / test_change_pg_count_ec (push) Successful in 37s Details
Test / test_change_pg_size (push) Successful in 8s Details
Test / test_create_nomaxid (push) Successful in 8s Details
Test / test_failure_domain (push) Successful in 16s Details
Test / test_interrupted_rebalance (push) Successful in 1m49s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m38s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m49s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m23s Details
Test / test_minsize_1 (push) Successful in 13s Details
Test / test_move_reappear (push) Successful in 16s Details
Test / test_rebalance_verify (push) Successful in 3m2s Details
Test / test_rebalance_verify_imm (push) Successful in 2m53s Details
Test / test_rebalance_verify_ec (push) Successful in 3m9s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 5m27s Details
Test / test_rm (push) Successful in 17s Details
Test / test_snapshot (push) Successful in 34s Details
Test / test_snapshot_ec (push) Successful in 29s Details
Test / test_splitbrain (push) Successful in 22s Details
Test / test_write (push) Successful in 37s Details
Test / test_write_xor (push) Successful in 44s Details
Test / test_write_no_same (push) Successful in 16s Details
Test / test_heal_pg_size_2 (push) Successful in 3m31s Details
Test / test_heal_ec (push) Successful in 4m20s Details
Test / test_scrub (push) Successful in 38s Details
Test / test_scrub_zero_osd_2 (push) Successful in 30s Details
Test / test_scrub_xor (push) Successful in 32s Details
Test / test_scrub_pg_size_3 (push) Successful in 42s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 37s Details
Test / test_scrub_ec (push) Successful in 34s Details
This makes statistics reported by vitastor-cli status much smoother
2023-06-18 01:32:24 +03:00
Vitaliy Filippov aea567cfbd Slightly improve scrub docs
Test / test_cas (push) Successful in 9s Details
Test / test_change_pg_count (push) Successful in 52s Details
Test / test_change_pg_count_ec (push) Successful in 1m0s Details
Test / test_change_pg_size (push) Successful in 16s Details
Test / test_create_nomaxid (push) Successful in 16s Details
Test / test_etcd_fail (push) Successful in 56s Details
Test / test_failure_domain (push) Successful in 13s Details
Test / test_interrupted_rebalance (push) Successful in 1m24s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m10s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m9s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m6s Details
Test / test_minsize_1 (push) Failing after 19s Details
Test / test_move_reappear (push) Successful in 28s Details
Test / test_rebalance_verify (push) Successful in 2m25s Details
Test / test_rebalance_verify_imm (push) Successful in 2m19s Details
Test / test_rebalance_verify_ec (push) Successful in 3m3s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m20s Details
Test / test_rm (push) Successful in 16s Details
Test / test_snapshot (push) Successful in 21s Details
Test / test_snapshot_ec (push) Successful in 28s Details
Test / test_splitbrain (push) Successful in 20s Details
Test / test_write_xor (push) Has started running Details
Test / test_heal_pg_size_2 (push) Has started running Details
Test / test_write (push) Has started running Details
Test / test_scrub (push) Has been cancelled Details
Test / test_scrub_zero_osd_2 (push) Has been cancelled Details
Test / test_scrub_xor (push) Has been cancelled Details
Test / test_scrub_pg_size_3 (push) Has been cancelled Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Has been cancelled Details
Test / test_scrub_ec (push) Has been cancelled Details
2023-05-21 12:52:30 +03:00
Vitaliy Filippov ce02f47de6 Allow to disable scrub_find_best 2023-05-21 12:33:38 +03:00
Vitaliy Filippov 8d40ad99a6 Add scrub documentation 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 3475772b07 Add configuration online update documentation 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 6648f6bb6e Implement ambiguity detection during scrub 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 3c924397e7 Store next scrub timestamp instead of last scrub timestamp 2023-05-20 23:19:39 +03:00
Vitaliy Filippov c3bd26193d Implement PG scrub runner 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0538a484b3 Add corrupted object state 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 022176aa98 Fix NaN during PG optimisation if there are nonexisting OSDs in node_placement
Test / buildenv (push) Successful in 11s Details
Test / build (push) Successful in 2m28s Details
Test / test_cas (push) Successful in 12s Details
Test / make_test (push) Successful in 40s Details
Test / test_change_pg_size (push) Successful in 23s Details
Test / test_change_pg_count (push) Successful in 1m1s Details
Test / test_create_nomaxid (push) Successful in 7s Details
Test / test_failure_domain (push) Successful in 11s Details
Test / test_change_pg_count_ec (push) Successful in 1m35s Details
Test / test_etcd_fail (push) Successful in 51s Details
Test / test_add_osd (push) Successful in 2m27s Details
Test / test_interrupted_rebalance (push) Successful in 1m14s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m3s Details
Test / test_minsize_1 (push) Successful in 28s Details
Test / test_move_reappear (push) Successful in 41s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m13s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m49s Details
Test / test_rebalance_verify (push) Successful in 2m21s Details
Test / test_rm (push) Successful in 15s Details
Test / test_rebalance_verify_imm (push) Successful in 2m12s Details
Test / test_snapshot (push) Successful in 20s Details
Test / test_snapshot_ec (push) Successful in 28s Details
Test / test_splitbrain (push) Successful in 23s Details
Test / test_write_no_same (push) Successful in 17s Details
Test / test_write (push) Successful in 1m6s Details
Test / test_write_xor (push) Successful in 1m42s Details
Test / test_heal_pg_size_2 (push) Successful in 4m57s Details
Test / test_heal_ec (push) Successful in 4m42s Details
Test / test_rebalance_verify_ec_imm (push) Failing after 2m19s Details
Test / test_rebalance_verify_ec (push) Failing after 2m25s Details
2023-05-17 01:20:30 +03:00
Vitaliy Filippov 120e3fa7bc Fix pool deletion
Test / buildenv (push) Successful in 10s Details
Test / build (push) Successful in 2m32s Details
Test / test_cas (push) Successful in 13s Details
Test / make_test (push) Successful in 35s Details
Test / test_change_pg_size (push) Successful in 21s Details
Test / test_change_pg_count (push) Successful in 53s Details
Test / test_create_nomaxid (push) Successful in 17s Details
Test / test_change_pg_count_ec (push) Successful in 1m3s Details
Test / test_failure_domain (push) Successful in 16s Details
Test / test_etcd_fail (push) Successful in 1m3s Details
Test / test_add_osd (push) Successful in 2m36s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m10s Details
Test / test_interrupted_rebalance (push) Successful in 1m24s Details
Test / test_minsize_1 (push) Failing after 28s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m8s Details
Test / test_move_reappear (push) Failing after 1m2s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m8s Details
Test / test_rebalance_verify_imm (push) Successful in 2m12s Details
Test / test_rebalance_verify (push) Successful in 2m22s Details
Test / test_rm (push) Successful in 21s Details
Test / test_snapshot (push) Successful in 24s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m19s Details
Test / test_snapshot_ec (push) Successful in 27s Details
Test / test_splitbrain (push) Successful in 20s Details
Test / test_rebalance_verify_ec (push) Successful in 2m33s Details
Test / test_write_no_same (push) Successful in 15s Details
Test / test_write (push) Successful in 1m14s Details
Test / test_write_xor (push) Successful in 2m9s Details
Test / test_heal_ec (push) Successful in 4m25s Details
Test / test_heal_pg_size_2 (push) Successful in 4m59s Details
2023-05-17 00:45:59 +03:00
Vitaliy Filippov 6f4dc16c59 Handle etcd connection errors correctly in mon (unhandled error events) 2023-05-11 11:02:44 +03:00
Vitaliy Filippov 321cb435a6 Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number 2023-05-08 20:39:20 +03:00
Vitaliy Filippov 5b9031fecc Fix monitor possibly applying incorrect PG history under heavy load
Monitor could deceive itself by immediately saving PG configuration changes
which weren't applied to etcd yet in memory, and apply incorrect PG history
changes next time if the first update fails.

This usually only happened under heavy load and was caught in CI. :-)
2023-05-07 23:23:00 +03:00
Vitaliy Filippov d06ed2b0e7 Implement online config update 2023-03-26 19:21:50 +03:00
Vitaliy Filippov 14d6acbcba Set default rdma_max_recv/send to 16/8, fix documentation 2023-02-28 11:00:56 +03:00
Vitaliy Filippov c3e80abad7 Allow to send more than 1 operation at a time 2023-02-26 02:01:04 +03:00
Vitaliy Filippov 2c8241b7db Remove PG "peered" state 2023-02-21 01:30:42 +03:00
Vitaliy Filippov 0f6b946add Time changes with every stat change, do not schedule checks based on it 2023-01-05 13:54:16 +03:00
Vitaliy Filippov 465cbf0b2f Do not re-schedule recheck indefinitely, run it after mon_change_timeout in any case 2023-01-05 13:48:06 +03:00
Vitaliy Filippov 41add50e4e Track last_clean_pgs on a per-pool basis 2023-01-03 02:20:50 +03:00
Vitaliy Filippov 3de57e87b1 Recheck OSD tree in monitor on /osd/stats changes 2022-12-26 02:48:48 +03:00
Vitaliy Filippov 2f13f347b0 Fix space stats in mon 2022-09-03 11:16:33 +03:00
Vitaliy Filippov 5a10d135f3 Allow to configure block_size, bitmap_granularity and immediate_commit per-pool 2022-08-11 01:56:33 +03:00
Vitaliy Filippov 36e851505a Make monitor delete pool statistics when the pool is deleted 2022-06-04 13:27:06 +03:00
Vitaliy Filippov 1efbbb0c36 Make deleted inodes vanish from statistics after 60 seconds 2022-06-04 13:27:06 +03:00
Vitaliy Filippov a0cae4c180 Rename "jerasure" to "ec" in pool configuration, function names, fix documentation and Debian build scripts
Old pool configurations with "jerasure" also remain supported as an alias for "ec"
2022-06-03 15:40:00 +03:00
Vitaliy Filippov cf03b9c84d Implement "primary affinity tags" 2022-05-09 22:37:23 +03:00
Vitaliy Filippov e718116f54 Fix incorrect reading of extra metadata block 2022-04-21 02:52:21 +03:00
Vitaliy Filippov c857272f44 Comment: epoch is uint64_t 2022-04-10 12:21:37 +03:00
Vitaliy Filippov d71cc174e3 Implement CLI status command 2022-04-09 00:25:51 +03:00
Vitaliy Filippov 3615e57879 Register standby monitors in etcd in /mon/member 2022-04-04 00:48:52 +03:00
Vitaliy Filippov 46d2bc100f Add some tolerance to stat calculation so it does not fail on a fresh DB 2022-02-11 16:37:16 +03:00
Vitaliy Filippov 73ae578981 Add osd_memlock option 2022-02-02 01:40:22 +03:00
Vitaliy Filippov d9869d8116 Add parameter documentation 2022-01-28 02:45:54 +03:00
Vitaliy Filippov 9a15b843ff Do not set pg_real_size to 0 2022-01-23 20:15:04 +03:00
Vitaliy Filippov a5cf06acd0 Remove etcd timeout and keepalive interval hardcode 2022-01-23 00:00:00 +03:00
Vitaliy Filippov 8f64fc61e7 Ignore empty events in mon 2022-01-08 11:41:00 +03:00
Vitaliy Filippov 4a9f001d9e Make mon also ping etcd websockets regularly 2022-01-05 17:28:51 +03:00
Vitaliy Filippov 68b6763ebe Add asserts for lp-optimizer tests, pass `ordered` from the monitor 2022-01-03 20:37:07 +03:00
Vitaliy Filippov 5473d5b4a2 Rework HTTP client to use keepalive, move getifaddr_list to addr_util 2022-01-03 14:52:01 +03:00
Vitaliy Filippov fa687d3878 Allow to configure OSD placement in node_placement 2021-12-12 01:25:45 +03:00
Vitaliy Filippov 32b1312abb Remove stale deleted inode statistics in monitor 2021-11-28 21:02:05 +03:00
Vitaliy Filippov 7a0b5212fe Exit if unable to restart watches
FIXME: It's probably not OK for the client to exit in this case
2021-11-28 01:43:31 +03:00
Vitaliy Filippov a8f5c71ae8 Use the same etcd address selection algorithm in the monitor 2021-11-28 01:19:42 +03:00
Vitaliy Filippov 6e0e172e15 Implement OSD address selection from a specified subnet 2021-11-23 21:59:26 +03:00