Vitaliy Filippov
f72f14e6a7
Clear old PG states, history, and OSD states on etcd state reload
...
Test / test_snapshot_ec (push) Successful in 30s
Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m24s
Details
Test / test_rm (push) Successful in 16s
Details
Test / test_snapshot_down (push) Successful in 23s
Details
Test / test_snapshot_down_ec (push) Successful in 25s
Details
Test / test_splitbrain (push) Successful in 21s
Details
Test / test_snapshot_chain (push) Successful in 2m24s
Details
Test / test_snapshot_chain_ec (push) Successful in 3m5s
Details
Test / test_rebalance_verify_imm (push) Successful in 3m21s
Details
Test / test_write (push) Successful in 36s
Details
Test / test_rebalance_verify (push) Successful in 4m12s
Details
Test / test_write_no_same (push) Successful in 15s
Details
Test / test_write_xor (push) Successful in 52s
Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m29s
Details
Test / test_rebalance_verify_ec (push) Successful in 5m25s
Details
Test / test_heal_pg_size_2 (push) Successful in 4m10s
Details
Test / test_heal_ec (push) Successful in 4m46s
Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m31s
Details
Test / test_heal_csum_32k_dj (push) Successful in 5m41s
Details
Test / test_heal_csum_32k (push) Successful in 6m41s
Details
Test / test_scrub (push) Successful in 1m13s
Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m53s
Details
Test / test_scrub_xor (push) Successful in 54s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 58s
Details
Test / test_heal_csum_4k_dj (push) Successful in 6m27s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m15s
Details
Test / test_scrub_pg_size_3 (push) Successful in 1m27s
Details
Test / test_heal_csum_4k (push) Successful in 6m20s
Details
Test / test_scrub_ec (push) Successful in 29s
Details
Test / test_move_reappear (push) Successful in 17s
Details
Also add protection from etcd watcher messages being split into multiple websocket
messages - I'm not sure if etcd actually does that, but it's better to have extra
protection anyway.
Also check that all etcd watchers are started in the keepalive routine, otherwise
it sometimes tries to revive etcd watchers starting with revision=1 which obviously
always fails because this revision is nearly always compacted.
All these changes should fix an old rarely reproduced bug where SOMETIMES OSDs
didn't react to PG config changes which was leading to offline pools on node reboot.
It happened on the full reload of state from etcd.
2023-12-24 02:02:13 +03:00
Vitaliy Filippov
691ebd991a
Move 2 last log printfs to stderr from stdout in etcd_state_client
Test / test_snapshot_ec (push) Successful in 29s
Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m46s
Details
Test / test_move_reappear (push) Successful in 20s
Details
Test / test_rm (push) Successful in 16s
Details
Test / test_snapshot_down (push) Successful in 31s
Details
Test / test_snapshot_down_ec (push) Successful in 33s
Details
Test / test_splitbrain (push) Successful in 25s
Details
Test / test_snapshot_chain (push) Successful in 2m12s
Details
Test / test_snapshot_chain_ec (push) Successful in 2m57s
Details
Test / test_rebalance_verify_ec_imm (push) Failing after 22s
Details
Test / test_rebalance_verify_imm (push) Successful in 2m45s
Details
Test / test_write (push) Successful in 31s
Details
Test / test_write_no_same (push) Successful in 15s
Details
Test / test_rebalance_verify (push) Successful in 3m32s
Details
Test / test_write_xor (push) Successful in 1m15s
Details
Test / test_heal_pg_size_2 (push) Successful in 4m3s
Details
Test / test_rebalance_verify_ec (push) Successful in 6m34s
Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m43s
Details
Test / test_heal_ec (push) Successful in 5m33s
Details
Test / test_heal_csum_32k_dj (push) Successful in 5m45s
Details
Test / test_heal_csum_32k (push) Successful in 6m37s
Details
Test / test_scrub (push) Successful in 1m3s
Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m39s
Details
Test / test_heal_csum_4k_dj (push) Successful in 6m37s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 54s
Details
Test / test_scrub_xor (push) Successful in 53s
Details
Test / test_scrub_pg_size_3 (push) Successful in 1m29s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 48s
Details
Test / test_scrub_ec (push) Successful in 46s
Details
Test / test_heal_csum_4k (push) Successful in 5m31s
Details
2023-12-08 00:01:52 +03:00
Vitaliy Filippov
b7d398be5b
Fix sscanf validation usage (field count instead of null_byte == 0)
Test / test_minsize_1 (push) Has been cancelled
Details
Test / test_move_reappear (push) Has been cancelled
Details
Test / test_rm (push) Has been cancelled
Details
Test / test_snapshot_chain (push) Has been cancelled
Details
Test / test_snapshot_chain_ec (push) Has been cancelled
Details
Test / test_snapshot_down (push) Has been cancelled
Details
Test / test_snapshot_down_ec (push) Has been cancelled
Details
Test / test_splitbrain (push) Has been cancelled
Details
Test / test_rebalance_verify (push) Has been cancelled
Details
Test / build (push) Has been cancelled
Details
Test / test_rebalance_verify_imm (push) Has been cancelled
Details
Test / test_rebalance_verify_ec (push) Has been cancelled
Details
Test / test_rebalance_verify_ec_imm (push) Has been cancelled
Details
Test / test_write (push) Has been cancelled
Details
Test / test_write_xor (push) Has been cancelled
Details
Test / test_write_no_same (push) Has been cancelled
Details
Test / test_heal_pg_size_2 (push) Has been cancelled
Details
Test / test_heal_ec (push) Has been cancelled
Details
Test / test_heal_csum_32k_dmj (push) Has been cancelled
Details
Test / test_heal_csum_32k_dj (push) Has been cancelled
Details
Test / test_heal_csum_32k (push) Has been cancelled
Details
Test / test_heal_csum_4k_dmj (push) Has been cancelled
Details
Test / test_heal_csum_4k_dj (push) Has been cancelled
Details
Test / test_heal_csum_4k (push) Has been cancelled
Details
Test / test_scrub (push) Has been cancelled
Details
Test / test_scrub_zero_osd_2 (push) Has been cancelled
Details
Test / test_scrub_xor (push) Has been cancelled
Details
Test / test_scrub_pg_size_3 (push) Has been cancelled
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Has been cancelled
Details
Test / test_scrub_ec (push) Has been cancelled
Details
2023-09-07 02:34:35 +03:00
Vitaliy Filippov
161a23c966
Support reloading state when etcd says "revisions were compacted"
...
Test / test_interrupted_rebalance (push) Successful in 3m9s
Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m38s
Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m54s
Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m36s
Details
Test / test_failure_domain (push) Successful in 9s
Details
Test / test_snapshot (push) Successful in 23s
Details
Test / test_snapshot_ec (push) Successful in 22s
Details
Test / test_minsize_1 (push) Successful in 14s
Details
Test / test_move_reappear (push) Successful in 19s
Details
Test / test_rm (push) Successful in 12s
Details
Test / test_snapshot_chain (push) Successful in 2m2s
Details
Test / test_snapshot_chain_ec (push) Successful in 2m38s
Details
Test / test_snapshot_down (push) Successful in 21s
Details
Test / test_snapshot_down_ec (push) Successful in 24s
Details
Test / test_splitbrain (push) Successful in 15s
Details
Test / test_rebalance_verify (push) Successful in 3m10s
Details
Test / test_rebalance_verify_imm (push) Successful in 3m10s
Details
Test / test_rebalance_verify_ec (push) Successful in 3m27s
Details
Test / test_rebalance_verify_ec_imm (push) Successful in 6m2s
Details
Test / test_write (push) Successful in 35s
Details
Test / test_write_xor (push) Successful in 45s
Details
Test / test_write_no_same (push) Successful in 22s
Details
Test / test_heal_pg_size_2 (push) Successful in 4m0s
Details
Test / test_heal_ec (push) Successful in 3m52s
Details
Test / test_scrub (push) Successful in 1m1s
Details
Test / test_scrub_zero_osd_2 (push) Successful in 42s
Details
Test / test_scrub_xor (push) Successful in 34s
Details
Test / test_scrub_pg_size_3 (push) Successful in 53s
Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 45s
Details
Test / test_scrub_ec (push) Successful in 26s
Details
Before this change, OSDs almost always died when one of the etcds was restarted,
even though the rest of them was still in quorum and the lease was still active
2023-07-07 01:33:48 +03:00
Vitaliy Filippov
45c0694853
Clear etcd_local addresses on reload and also skip duplicates
2023-07-06 00:39:39 +03:00
Vitaliy Filippov
3c924397e7
Store next scrub timestamp instead of last scrub timestamp
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
c3bd26193d
Implement PG scrub runner
2023-05-20 23:19:39 +03:00
Vitaliy Filippov
d06ed2b0e7
Implement online config update
2023-03-26 19:21:50 +03:00
Vitaliy Filippov
67019f5b02
Make OSD sort & sanitize PG history items
2023-01-01 23:17:42 +03:00
Vitaliy Filippov
5a10d135f3
Allow to configure block_size, bitmap_granularity and immediate_commit per-pool
2022-08-11 01:56:33 +03:00
Vitaliy Filippov
ae99ee6266
Rename base64.{cpp.h} to str_util
2022-07-31 01:12:37 +03:00
Vitaliy Filippov
a0cae4c180
Rename "jerasure" to "ec" in pool configuration, function names, fix documentation and Debian build scripts
...
Old pool configurations with "jerasure" also remain supported as an alias for "ec"
2022-06-03 15:40:00 +03:00
Vitaliy Filippov
d48a824846
Fix some warnings
2022-05-10 12:42:58 +03:00
Vitaliy Filippov
7c2379d458
Simplified NFS proxy based on own NFS/XDR implementation
2022-05-07 01:01:20 +03:00
Vitaliy Filippov
a2189100dd
Make CLI functions usable in library form
...
Return results and errors in a variable instead of just printing them,
separate vitastor-cli main() from cli_tool_t, move positional argument
parsing to CLI main from command implementations.
2022-05-06 02:18:32 +03:00
Vitaliy Filippov
842ba8b831
Use (uint64_t)1 instead of 1l / 1ul
2022-04-16 01:48:14 +03:00
Vitaliy Filippov
d71cc174e3
Implement CLI status command
2022-04-09 00:25:51 +03:00
Vitaliy Filippov
ba63af49b4
Add etcd retries everywhere (they were missing in some places)
2022-01-23 17:21:48 +03:00
Vitaliy Filippov
e01c4db702
Add paranoic if()s to prevent accidental double free of etcd_watch_ws
2022-01-23 00:16:09 +03:00
Vitaliy Filippov
a5cf06acd0
Remove etcd timeout and keepalive interval hardcode
2022-01-23 00:00:00 +03:00
Vitaliy Filippov
098e369a3b
Fix rand initialization, add etcd connection/disconnection logging
2022-01-20 00:45:49 +03:00
Vitaliy Filippov
5473d5b4a2
Rework HTTP client to use keepalive, move getifaddr_list to addr_util
2022-01-03 14:52:01 +03:00
Vitaliy Filippov
5859f913fc
Fix client failover in case of etcd shutdown or crash
2021-12-01 00:33:02 +03:00
Vitaliy Filippov
7a0b5212fe
Exit if unable to restart watches
...
FIXME: It's probably not OK for the client to exit in this case
2021-11-28 01:43:31 +03:00
Vitaliy Filippov
ce5b6253ab
Make OSDs stick to the last successful etcd address
...
Previously OSDs were selecting a new random etcd from the cluster
on every request so they were failing randomly when part of etcds was down
2021-11-27 23:48:56 +03:00
Vitaliy Filippov
fea451b4db
Prefer local etcd in OSD
2021-11-27 00:36:53 +03:00
Vitaliy Filippov
300d507026
Fix capture of out in alloc_osd
2021-11-25 10:20:01 +03:00
Vitaliy Filippov
8e445ddc9a
Begin to implement CLI: implement listing, add help, add create stub
2021-11-06 14:32:19 +03:00
Vitaliy Filippov
74cb3911db
Rebase children of the "inverse" child when it is removed, change /index/image/%s keys during metadata ops
2021-09-26 13:41:48 +03:00
Vitaliy Filippov
eaac1fc5d1
Log to stderr in etcd_state_client, too
2021-05-16 01:09:25 +03:00
Vitaliy Filippov
c467acc388
Fix /v3 appendage to etcd URLs without /v3
2021-05-15 19:22:24 +03:00
Vitaliy Filippov
f6f35f4127
Pass options correctly to not override /etc/vitastor/vitastor.conf
2021-04-30 01:17:44 +03:00
Vitaliy Filippov
5010b0dd75
Use json11 instead of blockstore_config_t
2021-04-30 00:52:46 +03:00
Vitaliy Filippov
6950b8e3a0
Watch inode metadata revisions
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
2612d3198a
Introduce image names and metadata storage in etcd
...
Each inode has: image name, parent inode number & pool, size and readonly flag
Snapshots are created by switching image name to a different inode number
while using the older inode as parent.
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
d0c2e31312
Add a test for snapshots, fix bugs. Now the test passes
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
691f066055
Actual snapshot support (untested)
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
a48e2bbf18
Fix write replay ordering when immediate_commit != all
...
Previous implementation didn't respect write ordering and could lead
to corrupted data when restarting writes after an OSD outage
Also rework cluster_client queueing logic and add tests for it to verify the correct behaviour
2021-04-03 14:51:52 +03:00
Vitaliy Filippov
688821665a
Remove stoull_full() from etcd_state_client.cpp
2021-04-03 14:36:04 +03:00
Vitaliy Filippov
9ac7e75178
Allow to specify etcd URLs for OSDs with http://, do not die with a strange error if -etcd option is missing for fio
2021-03-16 12:48:26 +03:00
Vitaliy Filippov
bc742ccf8c
Fix a small memory leak in etcd_state_client
2021-03-08 17:04:10 +03:00
Vitaliy Filippov
bf9a175efc
Move C/C++ sources to src subdirectory
2021-02-25 23:59:03 +03:00