Commit Graph

87 Commits (72f0cff79df1a5b6653f2a03c6a8dcd73d8a7d64)

Author SHA1 Message Date
Vitaliy Filippov 72f0cff79d WIP Use random_hier_combinations
Test / buildenv (push) Successful in 10s Details
Test / build (push) Successful in 2m31s Details
Test / test_cas (push) Successful in 10s Details
Test / make_test (push) Successful in 36s Details
Test / test_change_pg_size (push) Successful in 18s Details
Test / test_change_pg_count (push) Successful in 49s Details
Test / test_create_nomaxid (push) Successful in 8s Details
Test / test_failure_domain (push) Successful in 11s Details
Test / test_etcd_fail (push) Successful in 52s Details
Test / test_interrupted_rebalance (push) Successful in 1m21s Details
Test / test_add_osd (push) Successful in 2m35s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m10s Details
Test / test_minsize_1 (push) Successful in 18s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 52s Details
Test / test_move_reappear (push) Successful in 25s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m33s Details
Test / test_rebalance_verify (push) Successful in 2m37s Details
Test / test_rebalance_verify_imm (push) Successful in 2m34s Details
Test / test_rm (push) Successful in 11s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m15s Details
Test / test_snapshot (push) Successful in 20s Details
Test / test_rebalance_verify_ec (push) Successful in 3m6s Details
Test / test_splitbrain (push) Successful in 22s Details
Test / test_snapshot_ec (push) Successful in 32s Details
Test / test_write_no_same (push) Successful in 18s Details
Test / test_write (push) Successful in 55s Details
Test / test_write_xor (push) Successful in 1m41s Details
Test / test_heal_pg_size_2 (push) Successful in 4m21s Details
Test / test_heal_ec (push) Successful in 4m48s Details
Test / test_change_pg_count_ec (push) Successful in 43s Details
2023-05-18 17:44:00 +03:00
Vitaliy Filippov c1d470522c Replace flatten_tree with extract_tree_levels 2023-05-18 17:44:00 +03:00
Vitaliy Filippov 022176aa98 Fix NaN during PG optimisation if there are nonexisting OSDs in node_placement
Test / buildenv (push) Successful in 11s Details
Test / build (push) Successful in 2m28s Details
Test / test_cas (push) Successful in 12s Details
Test / make_test (push) Successful in 40s Details
Test / test_change_pg_size (push) Successful in 23s Details
Test / test_change_pg_count (push) Successful in 1m1s Details
Test / test_create_nomaxid (push) Successful in 7s Details
Test / test_failure_domain (push) Successful in 11s Details
Test / test_change_pg_count_ec (push) Successful in 1m35s Details
Test / test_etcd_fail (push) Successful in 51s Details
Test / test_add_osd (push) Successful in 2m27s Details
Test / test_interrupted_rebalance (push) Successful in 1m14s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m3s Details
Test / test_minsize_1 (push) Successful in 28s Details
Test / test_move_reappear (push) Successful in 41s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m13s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m49s Details
Test / test_rebalance_verify (push) Successful in 2m21s Details
Test / test_rm (push) Successful in 15s Details
Test / test_rebalance_verify_imm (push) Successful in 2m12s Details
Test / test_snapshot (push) Successful in 20s Details
Test / test_snapshot_ec (push) Successful in 28s Details
Test / test_splitbrain (push) Successful in 23s Details
Test / test_write_no_same (push) Successful in 17s Details
Test / test_write (push) Successful in 1m6s Details
Test / test_write_xor (push) Successful in 1m42s Details
Test / test_heal_pg_size_2 (push) Successful in 4m57s Details
Test / test_heal_ec (push) Successful in 4m42s Details
Test / test_rebalance_verify_ec_imm (push) Failing after 2m19s Details
Test / test_rebalance_verify_ec (push) Failing after 2m25s Details
2023-05-17 01:20:30 +03:00
Vitaliy Filippov 120e3fa7bc Fix pool deletion
Test / buildenv (push) Successful in 10s Details
Test / build (push) Successful in 2m32s Details
Test / test_cas (push) Successful in 13s Details
Test / make_test (push) Successful in 35s Details
Test / test_change_pg_size (push) Successful in 21s Details
Test / test_change_pg_count (push) Successful in 53s Details
Test / test_create_nomaxid (push) Successful in 17s Details
Test / test_change_pg_count_ec (push) Successful in 1m3s Details
Test / test_failure_domain (push) Successful in 16s Details
Test / test_etcd_fail (push) Successful in 1m3s Details
Test / test_add_osd (push) Successful in 2m36s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m10s Details
Test / test_interrupted_rebalance (push) Successful in 1m24s Details
Test / test_minsize_1 (push) Failing after 28s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m8s Details
Test / test_move_reappear (push) Failing after 1m2s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m8s Details
Test / test_rebalance_verify_imm (push) Successful in 2m12s Details
Test / test_rebalance_verify (push) Successful in 2m22s Details
Test / test_rm (push) Successful in 21s Details
Test / test_snapshot (push) Successful in 24s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m19s Details
Test / test_snapshot_ec (push) Successful in 27s Details
Test / test_splitbrain (push) Successful in 20s Details
Test / test_rebalance_verify_ec (push) Successful in 2m33s Details
Test / test_write_no_same (push) Successful in 15s Details
Test / test_write (push) Successful in 1m14s Details
Test / test_write_xor (push) Successful in 2m9s Details
Test / test_heal_ec (push) Successful in 4m25s Details
Test / test_heal_pg_size_2 (push) Successful in 4m59s Details
2023-05-17 00:45:59 +03:00
Vitaliy Filippov 6f4dc16c59 Handle etcd connection errors correctly in mon (unhandled error events) 2023-05-11 11:02:44 +03:00
Vitaliy Filippov 321cb435a6 Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number 2023-05-08 20:39:20 +03:00
Vitaliy Filippov 5b9031fecc Fix monitor possibly applying incorrect PG history under heavy load
Monitor could deceive itself by immediately saving PG configuration changes
which weren't applied to etcd yet in memory, and apply incorrect PG history
changes next time if the first update fails.

This usually only happened under heavy load and was caught in CI. :-)
2023-05-07 23:23:00 +03:00
Vitaliy Filippov d06ed2b0e7 Implement online config update 2023-03-26 19:21:50 +03:00
Vitaliy Filippov 14d6acbcba Set default rdma_max_recv/send to 16/8, fix documentation 2023-02-28 11:00:56 +03:00
Vitaliy Filippov c3e80abad7 Allow to send more than 1 operation at a time 2023-02-26 02:01:04 +03:00
Vitaliy Filippov 2c8241b7db Remove PG "peered" state 2023-02-21 01:30:42 +03:00
Vitaliy Filippov 0f6b946add Time changes with every stat change, do not schedule checks based on it 2023-01-05 13:54:16 +03:00
Vitaliy Filippov 465cbf0b2f Do not re-schedule recheck indefinitely, run it after mon_change_timeout in any case 2023-01-05 13:48:06 +03:00
Vitaliy Filippov 41add50e4e Track last_clean_pgs on a per-pool basis 2023-01-03 02:20:50 +03:00
Vitaliy Filippov 3de57e87b1 Recheck OSD tree in monitor on /osd/stats changes 2022-12-26 02:48:48 +03:00
Vitaliy Filippov 2f13f347b0 Fix space stats in mon 2022-09-03 11:16:33 +03:00
Vitaliy Filippov 5a10d135f3 Allow to configure block_size, bitmap_granularity and immediate_commit per-pool 2022-08-11 01:56:33 +03:00
Vitaliy Filippov 36e851505a Make monitor delete pool statistics when the pool is deleted 2022-06-04 13:27:06 +03:00
Vitaliy Filippov 1efbbb0c36 Make deleted inodes vanish from statistics after 60 seconds 2022-06-04 13:27:06 +03:00
Vitaliy Filippov a0cae4c180 Rename "jerasure" to "ec" in pool configuration, function names, fix documentation and Debian build scripts
Old pool configurations with "jerasure" also remain supported as an alias for "ec"
2022-06-03 15:40:00 +03:00
Vitaliy Filippov cf03b9c84d Implement "primary affinity tags" 2022-05-09 22:37:23 +03:00
Vitaliy Filippov e718116f54 Fix incorrect reading of extra metadata block 2022-04-21 02:52:21 +03:00
Vitaliy Filippov c857272f44 Comment: epoch is uint64_t 2022-04-10 12:21:37 +03:00
Vitaliy Filippov d71cc174e3 Implement CLI status command 2022-04-09 00:25:51 +03:00
Vitaliy Filippov 3615e57879 Register standby monitors in etcd in /mon/member 2022-04-04 00:48:52 +03:00
Vitaliy Filippov 46d2bc100f Add some tolerance to stat calculation so it does not fail on a fresh DB 2022-02-11 16:37:16 +03:00
Vitaliy Filippov 73ae578981 Add osd_memlock option 2022-02-02 01:40:22 +03:00
Vitaliy Filippov d9869d8116 Add parameter documentation 2022-01-28 02:45:54 +03:00
Vitaliy Filippov 9a15b843ff Do not set pg_real_size to 0 2022-01-23 20:15:04 +03:00
Vitaliy Filippov a5cf06acd0 Remove etcd timeout and keepalive interval hardcode 2022-01-23 00:00:00 +03:00
Vitaliy Filippov 8f64fc61e7 Ignore empty events in mon 2022-01-08 11:41:00 +03:00
Vitaliy Filippov 4a9f001d9e Make mon also ping etcd websockets regularly 2022-01-05 17:28:51 +03:00
Vitaliy Filippov 68b6763ebe Add asserts for lp-optimizer tests, pass `ordered` from the monitor 2022-01-03 20:37:07 +03:00
Vitaliy Filippov 5473d5b4a2 Rework HTTP client to use keepalive, move getifaddr_list to addr_util 2022-01-03 14:52:01 +03:00
Vitaliy Filippov fa687d3878 Allow to configure OSD placement in node_placement 2021-12-12 01:25:45 +03:00
Vitaliy Filippov 32b1312abb Remove stale deleted inode statistics in monitor 2021-11-28 21:02:05 +03:00
Vitaliy Filippov 7a0b5212fe Exit if unable to restart watches
FIXME: It's probably not OK for the client to exit in this case
2021-11-28 01:43:31 +03:00
Vitaliy Filippov a8f5c71ae8 Use the same etcd address selection algorithm in the monitor 2021-11-28 01:19:42 +03:00
Vitaliy Filippov 6e0e172e15 Implement OSD address selection from a specified subnet 2021-11-23 21:59:26 +03:00
Vitaliy Filippov 66fe1a469b Additionally balance parity chunks over OSDs using round-robin when generating initial distribution 2021-11-16 21:02:39 +03:00
Vitaliy Filippov aa436027c8 Report pg/history from OSD on every degraded activation
Required to prevent data loss due to activation of an OSD with older data
when PG OSD set change doesn't occur. I.e. fixes the simplest case:
- Run 2 OSDs with 1 PG
- Start writing into the PG
- Stop OSD 2
- Stop OSD 1
- Start OSD 2

After this change the PG will refuse to start after the last step.
2021-11-13 22:39:17 +03:00
Vitaliy Filippov 0f3f0a9d29 Calculate average statistics in mon, remove buggy "fix_stat_overflows" 2021-11-11 00:20:57 +03:00
Vitaliy Filippov 6e6f407df3 Simplify & fix monitor stats aggregation 2021-11-09 01:41:22 +03:00
Vitaliy Filippov 4d43774cbb Use 5s etcd_report_interval by default 2021-11-09 01:27:12 +03:00
Vitaliy Filippov ffb06536ff Revoke lease in mon on SIGINT & SIGTERM, fix raw_to_usable calculation 2021-11-06 13:54:35 +03:00
Vitaliy Filippov cfe8de9b84 Autosync based on number of unstable ops to prevent journal stalls 2021-10-30 14:26:48 +03:00
Vitaliy Filippov da99686a15 Correctly aggregate pool statistics for unknown pools 2021-10-21 18:58:56 +03:00
Vitaliy Filippov b66160a7ad Aggregate per-pool statistics in mon 2021-07-03 23:14:44 +03:00
Vitaliy Filippov dfdf5c1f9c Fix comments in mon.js 2021-06-20 00:23:56 +03:00
Vitaliy Filippov 6810e93c3f Add RDMA options to mon.js list 2021-04-30 01:23:22 +03:00