Vitaliy Filippov
978bdc128a
Apply recovery pause before writes, after commits, and do not apply it to syncs to not block EC pools from functioning
2024-02-11 16:13:52 +03:00
Vitaliy Filippov
bb2f395f1e
Add cutoff threshold for recovery auto-tuning
2024-02-11 16:13:52 +03:00
Vitaliy Filippov
99ee8596ea
Rename min/max_util to util_low/high
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
1edf86ed26
Aggregate recovery delay using simple mean over last 10 observations (EWMA is shit)
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
5ca7cde612
Experiment/WIP: Try to track "secondary" recovery ops separately
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
751935ddd8
WIP Auto-tune recovery speed
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
d84dee7098
Track recovery op latencies + refactor into a structure
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
be7e76f849
Split etcd_stats_interval out of etcd_report_interval
Test / test_interrupted_rebalance_ec (push) Successful in 1m46s
Details
Test / test_snapshot_ec (push) Successful in 36s
Details
Test / test_move_reappear (push) Successful in 19s
Details
Test / test_rm (push) Successful in 15s
Details
Test / test_snapshot_down (push) Successful in 29s
Details
Test / test_snapshot_down_ec (push) Successful in 30s
Details
Test / test_splitbrain (push) Successful in 26s
Details
Test / test_snapshot_chain (push) Successful in 2m15s
Details
Test / test_snapshot_chain_ec (push) Successful in 2m57s
Details
Test / test_rebalance_verify_imm (push) Successful in 2m29s
Details
Test / test_rebalance_verify (push) Successful in 3m40s
Details
Test / test_write (push) Successful in 1m0s
Details
Test / test_write_no_same (push) Successful in 13s
Details
Test / test_write_xor (push) Successful in 50s
Details
Test / test_rebalance_verify_ec (push) Successful in 4m58s
Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m14s
Details
Test / test_heal_pg_size_2 (push) Successful in 4m21s
Details
Test / test_heal_ec (push) Successful in 4m5s
Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m36s
Details
Test / test_heal_csum_32k_dj (push) Successful in 6m28s
Details
Test / test_heal_csum_32k (push) Successful in 6m38s
Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m46s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 59s
Details
Test / test_scrub (push) Successful in 1m16s
Details
Test / test_scrub_xor (push) Successful in 53s
Details
Test / test_scrub_pg_size_3 (push) Successful in 1m57s
Details
Test / test_heal_csum_4k_dj (push) Successful in 6m18s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m7s
Details
Test / test_heal_csum_4k (push) Successful in 5m43s
Details
Test / test_scrub_ec (push) Successful in 32s
Details
2023-10-27 01:26:26 +03:00
Vitaliy Filippov
161a23c966
Support reloading state when etcd says "revisions were compacted"
...
Test / test_interrupted_rebalance (push) Successful in 3m9s
Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m38s
Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m54s
Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m36s
Details
Test / test_failure_domain (push) Successful in 9s
Details
Test / test_snapshot (push) Successful in 23s
Details
Test / test_snapshot_ec (push) Successful in 22s
Details
Test / test_minsize_1 (push) Successful in 14s
Details
Test / test_move_reappear (push) Successful in 19s
Details
Test / test_rm (push) Successful in 12s
Details
Test / test_snapshot_chain (push) Successful in 2m2s
Details
Test / test_snapshot_chain_ec (push) Successful in 2m38s
Details
Test / test_snapshot_down (push) Successful in 21s
Details
Test / test_snapshot_down_ec (push) Successful in 24s
Details
Test / test_splitbrain (push) Successful in 15s
Details
Test / test_rebalance_verify (push) Successful in 3m10s
Details
Test / test_rebalance_verify_imm (push) Successful in 3m10s
Details
Test / test_rebalance_verify_ec (push) Successful in 3m27s
Details
Test / test_rebalance_verify_ec_imm (push) Successful in 6m2s
Details
Test / test_write (push) Successful in 35s
Details
Test / test_write_xor (push) Successful in 45s
Details
Test / test_write_no_same (push) Successful in 22s
Details
Test / test_heal_pg_size_2 (push) Successful in 4m0s
Details
Test / test_heal_ec (push) Successful in 3m52s
Details
Test / test_scrub (push) Successful in 1m1s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 42s
Details
Test / test_scrub_xor (push) Successful in 34s
Details
Test / test_scrub_pg_size_3 (push) Successful in 53s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 45s
Details
Test / test_scrub_ec (push) Successful in 26s
Details
Before this change, OSDs almost always died when one of the etcds was restarted,
even though the rest of them was still in quorum and the lease was still active
2023-07-07 01:33:48 +03:00
Vitaliy Filippov
ce02f47de6
Allow to disable scrub_find_best
2023-05-21 12:33:38 +03:00
Vitaliy Filippov
fa90b5a4e7
Schedule automatic scrubs correctly (not just after previous scrub)
2023-05-20 23:20:09 +03:00
Vitaliy Filippov
0439981a66
Implement "describe object(s)" operation
...
Required to implement fixing inconsistent objects in vitastor-cli
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
6648f6bb6e
Implement ambiguity detection during scrub
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
0c78dd7178
Add no_scrub flag
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
3c924397e7
Store next scrub timestamp instead of last scrub timestamp
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
c3bd26193d
Implement PG scrub runner
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
43b77d7619
Implement scrubbing "data path" - OSD_OP_SCRUB
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
8dc427b43c
Retry failed reads (including chained and RMW) from other replicas
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
bf2112653b
Refcount object_states
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
0538a484b3
Add corrupted object state
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
d06ed2b0e7
Implement online config update
2023-03-26 19:21:50 +03:00
Vitaliy Filippov
2fb0c85618
Allow to start OSDs without local store (only for tests)
2023-03-15 01:13:59 +03:00
Vitaliy Filippov
e950c024d3
Do not sync peer OSDs before listing
...
Sync before listing was added to wait for all PG writes possibly left in queue
from the previous master to finish before listing it
But in fact it may block the cluster when EC is used and some unstable writes
are left in the queue - they block journal flushing, rollback/stabilize is
required to unblock them, but rollback/stabilize may only happen after PG is
peered. But peering needs listings, listings are requested only after sync, and
sync itself waits for currently blocked writes waiting in the queue
2023-01-03 00:05:45 +03:00
Vitaliy Filippov
998e24adf8
Add a new recovery_pg_switch setting to mix all PGs during recovery
2022-12-30 02:03:33 +03:00
Vitaliy Filippov
5a10d135f3
Allow to configure block_size, bitmap_granularity and immediate_commit per-pool
2022-08-11 01:56:33 +03:00
Vitaliy Filippov
1efbbb0c36
Make deleted inodes vanish from statistics after 60 seconds
2022-06-04 13:27:06 +03:00
Vitaliy Filippov
61ebed144a
Fix OSDs possibly dying with "map::at" errors when other OSDs are stopped
2022-02-09 10:35:29 +03:00
Vitaliy Filippov
7bdd92ca4f
Fix build under clang and some warnings
...
Build problems fixed:
- void* pointer arithmetic which is a GNU extension (works as byte*)
- "variable size object may not be initialized" which is OK under GCC
- nullptr_t related error in json11 (it lacks 'operator <' in clang)
Warnings fixed:
- empty nested struct initializer { 0 } replaced by {}
- removed several unused lambda captures
2022-01-16 00:02:54 +03:00
Vitaliy Filippov
1bbe62f29c
Fix uninitialized listen_backlog which was leading to REALLY SLOW send speeds!!!
2021-12-25 11:38:13 +03:00
Vitaliy Filippov
4d43774cbb
Use 5s etcd_report_interval by default
2021-11-09 01:27:12 +03:00
Vitaliy Filippov
cfe8de9b84
Autosync based on number of unstable ops to prevent journal stalls
2021-10-30 14:26:48 +03:00
Vitaliy Filippov
5010b0dd75
Use json11 instead of blockstore_config_t
2021-04-30 00:52:46 +03:00
Vitaliy Filippov
bd7b177707
Report sensitive configuration values instead of the configuration source
2021-04-17 23:11:16 +03:00
Vitaliy Filippov
57e2c503f7
Rename osd_t::c_cli to msgr
2021-04-17 16:32:09 +03:00
Vitaliy Filippov
38a3df4a0e
Implement chained (optimized) read in the primary OSD code
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6950b8e3a0
Watch inode metadata revisions
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
d6524670e1
Introduce data distribution locality
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
ab39ce2bbb
Use clean_entry_bitmap_size instead of entry_attr_size back because of changed bitmap handling
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
ffe1cd4c79
Report inode I/O statistics, aggregate it in the monitor
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
860ac24762
Add "external" bitmap support to the secondary OSD protocol
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
54f2353f24
Use bitmap granularity for alignment checks
2021-04-03 14:36:04 +03:00
Vitaliy Filippov
883bf84a16
Fix build
2021-04-03 01:47:15 +03:00
Vitaliy Filippov
cf9a641d66
Skip disconnected OSDs during sync
2021-03-24 14:20:56 +03:00
Vitaliy Filippov
05db1308aa
Fix two potential read/write ordering problems (even though not yet seen in tests)
...
- Write operations could be 'stabilized' and previous versions could be
purged from OSDs before the removal of version_override and following
reads could potentially hit different version in EC pools
- Object was marked clean after completing the delete during recovery, so
reads could in theory hit a deleted version and return nothing
2021-03-24 14:20:56 +03:00
Vitaliy Filippov
435045751d
Delete objects only after a SYNC during rebalance in the non-immediate_commit mode
...
Previously OSDs could commit deletes before writes during recovery or rebalance
in the "lazy fsync" (immediate_commit=off) mode which could result in lost objects
2021-03-16 12:48:26 +03:00
Vitaliy Filippov
af5155fcd9
Implement "no_recovery" and "no_rebalance" flags
2021-03-11 00:36:31 +03:00
Vitaliy Filippov
8bdd6d8d78
Reset PG state when stopping them
2021-03-08 17:04:10 +03:00
Vitaliy Filippov
bf9a175efc
Move C/C++ sources to src subdirectory
2021-02-25 23:59:03 +03:00