Vitaliy Filippov
2412d9e239
Fix TTL comparison for lease/keepalive
Test / test_snapshot_chain_ec (push) Successful in 3m5s
Details
Test / test_rebalance_verify_imm (push) Successful in 3m29s
Details
Test / test_root_node (push) Successful in 9s
Details
Test / test_rebalance_verify (push) Successful in 4m3s
Details
Test / test_switch_primary (push) Successful in 35s
Details
Test / test_write (push) Successful in 54s
Details
Test / test_write_no_same (push) Successful in 13s
Details
Test / test_write_xor (push) Successful in 54s
Details
Test / test_rebalance_verify_ec_imm (push) Successful in 3m58s
Details
Test / test_rebalance_verify_ec (push) Successful in 4m58s
Details
Test / test_heal_pg_size_2 (push) Successful in 4m6s
Details
Test / test_heal_ec (push) Successful in 4m15s
Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m52s
Details
Test / test_heal_csum_32k_dj (push) Successful in 5m59s
Details
Test / test_heal_csum_32k (push) Successful in 7m7s
Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m57s
Details
Test / test_osd_tags (push) Successful in 28s
Details
Test / test_enospc (push) Successful in 1m58s
Details
Test / test_heal_csum_4k_dj (push) Successful in 6m53s
Details
Test / test_heal_csum_4k (push) Successful in 6m20s
Details
Test / test_enospc_xor (push) Successful in 2m9s
Details
Test / test_enospc_imm (push) Successful in 41s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 35s
Details
Test / test_scrub (push) Successful in 38s
Details
Test / test_scrub_xor (push) Successful in 34s
Details
Test / test_enospc_imm_xor (push) Successful in 58s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 35s
Details
Test / test_scrub_ec (push) Successful in 33s
Details
Test / test_nfs (push) Successful in 19s
Details
Test / test_scrub_pg_size_3 (push) Successful in 41s
Details
2024-04-30 01:53:05 +03:00
Vitaliy Filippov
6783d4a13c
Implement fool protection for FS pools
2024-03-16 13:24:36 +03:00
Vitaliy Filippov
3aee37eadd
Allow to disable per-inode stats for VitastorFS pools
2024-03-16 13:24:36 +03:00
Vitaliy Filippov
f20564b44b
Fix 32-bit build warnings (99.9% in printf)
2024-02-22 12:22:16 +03:00
Vitaliy Filippov
8389c0f33b
Fix PGs sometimes hanging in "starting" state on mass OSD restarts
2024-02-15 23:38:52 +03:00
Vitaliy Filippov
f03a9db4d9
Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)
2024-02-03 20:31:08 +03:00
Vitaliy Filippov
d84dee7098
Track recovery op latencies + refactor into a structure
2023-12-31 01:23:17 +03:00
Vitaliy Filippov
f72f14e6a7
Clear old PG states, history, and OSD states on etcd state reload
...
Test / test_snapshot_ec (push) Successful in 30s
Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m24s
Details
Test / test_rm (push) Successful in 16s
Details
Test / test_snapshot_down (push) Successful in 23s
Details
Test / test_snapshot_down_ec (push) Successful in 25s
Details
Test / test_splitbrain (push) Successful in 21s
Details
Test / test_snapshot_chain (push) Successful in 2m24s
Details
Test / test_snapshot_chain_ec (push) Successful in 3m5s
Details
Test / test_rebalance_verify_imm (push) Successful in 3m21s
Details
Test / test_write (push) Successful in 36s
Details
Test / test_rebalance_verify (push) Successful in 4m12s
Details
Test / test_write_no_same (push) Successful in 15s
Details
Test / test_write_xor (push) Successful in 52s
Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m29s
Details
Test / test_rebalance_verify_ec (push) Successful in 5m25s
Details
Test / test_heal_pg_size_2 (push) Successful in 4m10s
Details
Test / test_heal_ec (push) Successful in 4m46s
Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m31s
Details
Test / test_heal_csum_32k_dj (push) Successful in 5m41s
Details
Test / test_heal_csum_32k (push) Successful in 6m41s
Details
Test / test_scrub (push) Successful in 1m13s
Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m53s
Details
Test / test_scrub_xor (push) Successful in 54s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 58s
Details
Test / test_heal_csum_4k_dj (push) Successful in 6m27s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m15s
Details
Test / test_scrub_pg_size_3 (push) Successful in 1m27s
Details
Test / test_heal_csum_4k (push) Successful in 6m20s
Details
Test / test_scrub_ec (push) Successful in 29s
Details
Test / test_move_reappear (push) Successful in 17s
Details
Also add protection from etcd watcher messages being split into multiple websocket
messages - I'm not sure if etcd actually does that, but it's better to have extra
protection anyway.
Also check that all etcd watchers are started in the keepalive routine, otherwise
it sometimes tries to revive etcd watchers starting with revision=1 which obviously
always fails because this revision is nearly always compacted.
All these changes should fix an old rarely reproduced bug where SOMETIMES OSDs
didn't react to PG config changes which was leading to offline pools on node reboot.
It happened on the full reload of state from etcd.
2023-12-24 02:02:13 +03:00
Vitaliy Filippov
be7e76f849
Split etcd_stats_interval out of etcd_report_interval
Test / test_interrupted_rebalance_ec (push) Successful in 1m46s
Details
Test / test_snapshot_ec (push) Successful in 36s
Details
Test / test_move_reappear (push) Successful in 19s
Details
Test / test_rm (push) Successful in 15s
Details
Test / test_snapshot_down (push) Successful in 29s
Details
Test / test_snapshot_down_ec (push) Successful in 30s
Details
Test / test_splitbrain (push) Successful in 26s
Details
Test / test_snapshot_chain (push) Successful in 2m15s
Details
Test / test_snapshot_chain_ec (push) Successful in 2m57s
Details
Test / test_rebalance_verify_imm (push) Successful in 2m29s
Details
Test / test_rebalance_verify (push) Successful in 3m40s
Details
Test / test_write (push) Successful in 1m0s
Details
Test / test_write_no_same (push) Successful in 13s
Details
Test / test_write_xor (push) Successful in 50s
Details
Test / test_rebalance_verify_ec (push) Successful in 4m58s
Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m14s
Details
Test / test_heal_pg_size_2 (push) Successful in 4m21s
Details
Test / test_heal_ec (push) Successful in 4m5s
Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m36s
Details
Test / test_heal_csum_32k_dj (push) Successful in 6m28s
Details
Test / test_heal_csum_32k (push) Successful in 6m38s
Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m46s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 59s
Details
Test / test_scrub (push) Successful in 1m16s
Details
Test / test_scrub_xor (push) Successful in 53s
Details
Test / test_scrub_pg_size_3 (push) Successful in 1m57s
Details
Test / test_heal_csum_4k_dj (push) Successful in 6m18s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m7s
Details
Test / test_heal_csum_4k (push) Successful in 5m43s
Details
Test / test_scrub_ec (push) Successful in 32s
Details
2023-10-27 01:26:26 +03:00
Vitaliy Filippov
ff479a102d
Make MON filter OSDs by block layout to prevent "refusing to start PGs of this pool" errors on misconfiguration
2023-09-16 17:52:17 +03:00
Vitaliy Filippov
cc0fdc6253
Remove erroneous block_size mismatch warnings on pools without matching PGs
Test / test_snapshot_ec (push) Successful in 36s
Details
Test / test_minsize_1 (push) Successful in 13s
Details
Test / test_rm (push) Successful in 16s
Details
Test / test_snapshot_down (push) Successful in 24s
Details
Test / test_move_reappear (push) Failing after 52s
Details
Test / test_snapshot_down_ec (push) Successful in 23s
Details
Test / test_splitbrain (push) Successful in 21s
Details
Test / test_snapshot_chain (push) Successful in 2m23s
Details
Test / test_snapshot_chain_ec (push) Successful in 2m58s
Details
Test / test_rebalance_verify (push) Successful in 3m32s
Details
Test / test_rebalance_verify_imm (push) Successful in 3m29s
Details
Test / test_write (push) Successful in 52s
Details
Test / test_write_xor (push) Successful in 56s
Details
Test / test_write_no_same (push) Successful in 15s
Details
Test / test_rebalance_verify_ec (push) Successful in 5m0s
Details
Test / test_rebalance_verify_ec_imm (push) Successful in 5m30s
Details
Test / test_heal_ec (push) Successful in 4m6s
Details
Test / test_heal_pg_size_2 (push) Failing after 4m19s
Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m2s
Details
Test / test_heal_csum_32k_dj (push) Successful in 6m12s
Details
Test / test_heal_csum_32k (push) Successful in 6m24s
Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m19s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m8s
Details
Test / test_scrub (push) Successful in 1m15s
Details
Test / test_scrub_xor (push) Successful in 1m8s
Details
Test / test_heal_csum_4k_dj (push) Successful in 6m45s
Details
Test / test_scrub_pg_size_3 (push) Successful in 1m58s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m9s
Details
Test / test_scrub_ec (push) Successful in 42s
Details
Test / test_heal_csum_4k (push) Successful in 5m26s
Details
2023-09-08 23:19:04 +03:00
Vitaliy Filippov
b7d398be5b
Fix sscanf validation usage (field count instead of null_byte == 0)
Test / test_minsize_1 (push) Has been cancelled
Details
Test / test_move_reappear (push) Has been cancelled
Details
Test / test_rm (push) Has been cancelled
Details
Test / test_snapshot_chain (push) Has been cancelled
Details
Test / test_snapshot_chain_ec (push) Has been cancelled
Details
Test / test_snapshot_down (push) Has been cancelled
Details
Test / test_snapshot_down_ec (push) Has been cancelled
Details
Test / test_splitbrain (push) Has been cancelled
Details
Test / test_rebalance_verify (push) Has been cancelled
Details
Test / build (push) Has been cancelled
Details
Test / test_rebalance_verify_imm (push) Has been cancelled
Details
Test / test_rebalance_verify_ec (push) Has been cancelled
Details
Test / test_rebalance_verify_ec_imm (push) Has been cancelled
Details
Test / test_write (push) Has been cancelled
Details
Test / test_write_xor (push) Has been cancelled
Details
Test / test_write_no_same (push) Has been cancelled
Details
Test / test_heal_pg_size_2 (push) Has been cancelled
Details
Test / test_heal_ec (push) Has been cancelled
Details
Test / test_heal_csum_32k_dmj (push) Has been cancelled
Details
Test / test_heal_csum_32k_dj (push) Has been cancelled
Details
Test / test_heal_csum_32k (push) Has been cancelled
Details
Test / test_heal_csum_4k_dmj (push) Has been cancelled
Details
Test / test_heal_csum_4k_dj (push) Has been cancelled
Details
Test / test_heal_csum_4k (push) Has been cancelled
Details
Test / test_scrub (push) Has been cancelled
Details
Test / test_scrub_zero_osd_2 (push) Has been cancelled
Details
Test / test_scrub_xor (push) Has been cancelled
Details
Test / test_scrub_pg_size_3 (push) Has been cancelled
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Has been cancelled
Details
Test / test_scrub_ec (push) Has been cancelled
Details
2023-09-07 02:34:35 +03:00
Vitaliy Filippov
161a23c966
Support reloading state when etcd says "revisions were compacted"
...
Test / test_interrupted_rebalance (push) Successful in 3m9s
Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m38s
Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m54s
Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m36s
Details
Test / test_failure_domain (push) Successful in 9s
Details
Test / test_snapshot (push) Successful in 23s
Details
Test / test_snapshot_ec (push) Successful in 22s
Details
Test / test_minsize_1 (push) Successful in 14s
Details
Test / test_move_reappear (push) Successful in 19s
Details
Test / test_rm (push) Successful in 12s
Details
Test / test_snapshot_chain (push) Successful in 2m2s
Details
Test / test_snapshot_chain_ec (push) Successful in 2m38s
Details
Test / test_snapshot_down (push) Successful in 21s
Details
Test / test_snapshot_down_ec (push) Successful in 24s
Details
Test / test_splitbrain (push) Successful in 15s
Details
Test / test_rebalance_verify (push) Successful in 3m10s
Details
Test / test_rebalance_verify_imm (push) Successful in 3m10s
Details
Test / test_rebalance_verify_ec (push) Successful in 3m27s
Details
Test / test_rebalance_verify_ec_imm (push) Successful in 6m2s
Details
Test / test_write (push) Successful in 35s
Details
Test / test_write_xor (push) Successful in 45s
Details
Test / test_write_no_same (push) Successful in 22s
Details
Test / test_heal_pg_size_2 (push) Successful in 4m0s
Details
Test / test_heal_ec (push) Successful in 3m52s
Details
Test / test_scrub (push) Successful in 1m1s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 42s
Details
Test / test_scrub_xor (push) Successful in 34s
Details
Test / test_scrub_pg_size_3 (push) Successful in 53s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 45s
Details
Test / test_scrub_ec (push) Successful in 26s
Details
Before this change, OSDs almost always died when one of the etcds was restarted,
even though the rest of them was still in quorum and the lease was still active
2023-07-07 01:33:48 +03:00
Vitaliy Filippov
3c924397e7
Store next scrub timestamp instead of last scrub timestamp
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
c3bd26193d
Implement PG scrub runner
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
0538a484b3
Add corrupted object state
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
6442010f93
Skip offline PGs during state reporting when the state is already deleted or taken over by another OSD
...
This fixes OSDs being unable to report PG states in rare conditions
2023-05-12 23:17:45 +03:00
Vitaliy Filippov
480509f5b9
Fix pg_data_size > 1 for replicas (harmless bug)
2023-04-23 01:50:42 +03:00
Vitaliy Filippov
d06ed2b0e7
Implement online config update
2023-03-26 19:21:50 +03:00
Vitaliy Filippov
2fb0c85618
Allow to start OSDs without local store (only for tests)
2023-03-15 01:13:59 +03:00
Vitaliy Filippov
2c8241b7db
Remove PG "peered" state
2023-02-21 01:30:42 +03:00
Vitaliy Filippov
373f9d0387
Try to re-peer PGs on history change
2023-01-06 12:46:44 +03:00
Vitaliy Filippov
02e7be7dc9
Prevent reenterability side effects during PG history operation resume
2023-01-03 02:20:50 +03:00
Vitaliy Filippov
a4dfa519af
Report PG history synchronously during write
...
This has 2 effects:
1) OSD sets aren't added into PG history until actual write attempts anymore
which removes unneeded extra osd_sets in PG history
2) New OSD sets are reported synchronously and can't be lost on PG restarts
happening at the same time with reconfiguration
2023-01-01 23:41:05 +03:00
Vitaliy Filippov
67019f5b02
Make OSD sort & sanitize PG history items
2023-01-01 23:17:42 +03:00
Vitaliy Filippov
0593e5c21c
Fix OSD peer config safety check
2022-12-31 02:24:42 +03:00
Vitaliy Filippov
1407db9c08
Fix vitastor-disk prepare bugs
2022-08-19 02:22:54 +03:00
Vitaliy Filippov
5a10d135f3
Allow to configure block_size, bitmap_granularity and immediate_commit per-pool
2022-08-11 01:56:33 +03:00
Vitaliy Filippov
ae99ee6266
Rename base64.{cpp.h} to str_util
2022-07-31 01:12:37 +03:00
Vitaliy Filippov
3e1b03bb5c
Show all etcd addresses in the "reporting to..." message
2022-06-04 13:27:06 +03:00
Vitaliy Filippov
1efbbb0c36
Make deleted inodes vanish from statistics after 60 seconds
2022-06-04 13:27:06 +03:00
Vitaliy Filippov
a0cae4c180
Rename "jerasure" to "ec" in pool configuration, function names, fix documentation and Debian build scripts
...
Old pool configurations with "jerasure" also remain supported as an alias for "ec"
2022-06-03 15:40:00 +03:00
Vitaliy Filippov
e718116f54
Fix incorrect reading of extra metadata block
2022-04-21 02:52:21 +03:00
Vitaliy Filippov
842ba8b831
Use (uint64_t)1 instead of 1l / 1ul
2022-04-16 01:48:14 +03:00
Vitaliy Filippov
7cbfdff41a
Replace some throws with force_stop
2022-02-20 00:21:19 +03:00
Vitaliy Filippov
ba63af49b4
Add etcd retries everywhere (they were missing in some places)
2022-01-23 17:21:48 +03:00
Vitaliy Filippov
a5cf06acd0
Remove etcd timeout and keepalive interval hardcode
2022-01-23 00:00:00 +03:00
Vitaliy Filippov
515a2e6e33
Only die when detecting a real race condition, not just a CAS failure
2022-01-05 17:05:25 +03:00
Vitaliy Filippov
5473d5b4a2
Rework HTTP client to use keepalive, move getifaddr_list to addr_util
2022-01-03 14:52:01 +03:00
Vitaliy Filippov
ce5b6253ab
Make OSDs stick to the last successful etcd address
...
Previously OSDs were selecting a new random etcd from the cluster
on every request so they were failing randomly when part of etcds was down
2021-11-27 23:48:56 +03:00
harley
6886171289
report pg state failed
...
after report pg state failed parse response error
2021-11-25 09:34:34 +08:00
Vitaliy Filippov
aa436027c8
Report pg/history from OSD on every degraded activation
...
Required to prevent data loss due to activation of an OSD with older data
when PG OSD set change doesn't occur. I.e. fixes the simplest case:
- Run 2 OSDs with 1 PG
- Start writing into the PG
- Stop OSD 2
- Stop OSD 1
- Start OSD 2
After this change the PG will refuse to start after the last step.
2021-11-13 22:39:17 +03:00
Vitaliy Filippov
5010b0dd75
Use json11 instead of blockstore_config_t
2021-04-30 00:52:46 +03:00
Vitaliy Filippov
57e2c503f7
Rename osd_t::c_cli to msgr
2021-04-17 16:32:09 +03:00
Vitaliy Filippov
82c1a7ec67
Fix statistics reporting, split inode number into pool & inode
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6950b8e3a0
Watch inode metadata revisions
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
ffe1cd4c79
Report inode I/O statistics, aggregate it in the monitor
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
4ae1b84c67
Report inode space usage statistics to etcd, aggregate it in the monitor
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
97efb9e299
Do not crash on PG re-peering events when operations are in progress
2021-04-07 11:06:31 +03:00
Vitaliy Filippov
3e162d95a0
Remove http_client.h include from etcd_state_client.h
2021-04-03 14:36:04 +03:00