1
0
Fork 0
Commit Graph

1628 Commits (master)

Author SHA1 Message Date
Vitaliy Filippov f757a35a8d Retry PG changes without re-running lpsolve when pool configuration and OSD tree don't change
OSDs often change their /pg/history keys during rebalance, so monitor receives additional
transaction failures from etcd if it re-runs lpsolve which sometimes may even lead to monitor
being unable to apply PG changes at all until rebalance completes
2023-12-31 01:23:17 +03:00
Vitaliy Filippov 1edf86ed26 Aggregate recovery delay using simple mean over last 10 observations (EWMA is shit) 2023-12-31 01:23:17 +03:00
Vitaliy Filippov 5ca7cde612 Experiment/WIP: Try to track "secondary" recovery ops separately 2023-12-31 01:23:17 +03:00
Vitaliy Filippov 751935ddd8 WIP Auto-tune recovery speed 2023-12-31 01:23:17 +03:00
Vitaliy Filippov d84dee7098 Track recovery op latencies + refactor into a structure 2023-12-31 01:23:17 +03:00
Vitaliy Filippov dcc76eee15 Add a parity chunk count change test script 2023-12-26 23:48:41 +03:00
Vitaliy Filippov 2f38adeb3d Restart dead VDUSE daemons at regular intervals 2023-12-24 12:58:50 +03:00
Vitaliy Filippov f72f14e6a7 Clear old PG states, history, and OSD states on etcd state reload
Also add protection from etcd watcher messages being split into multiple websocket
messages - I'm not sure if etcd actually does that, but it's better to have extra
protection anyway.

Also check that all etcd watchers are started in the keepalive routine, otherwise
it sometimes tries to revive etcd watchers starting with revision=1 which obviously
always fails because this revision is nearly always compacted.

All these changes should fix an old rarely reproduced bug where SOMETIMES OSDs
didn't react to PG config changes which was leading to offline pools on node reboot.
It happened on the full reload of state from etcd.
2023-12-24 02:02:13 +03:00
Vitaliy Filippov 1299373988 Use the same etcd_ws_keepalive_interval in OSD and mon 2023-12-23 20:07:29 +03:00
Vitaliy Filippov 178bb0e701 Prevent re-entry into timerfd set_nearest 2023-12-22 02:32:40 +03:00
Vitaliy Filippov 4ece4dfdd0 Fix mon not using values from config when /config/global is not present 2023-12-22 02:25:09 +03:00
Vitaliy Filippov 95631773b6 Remove pve-storage-portal-dns-list format for vitastor_etcd_address 2023-12-20 02:22:06 +03:00
Vitaliy Filippov 7239cfb91a Parse log_level in cluster_client 2023-12-20 02:21:23 +03:00
Vitaliy Filippov 7cea642f4a Fix vitastor-nbd image existence check not working because of non-zeroed inode_watch fields 2023-12-19 01:11:37 +03:00
Vitaliy Filippov dc615403d9 Do not warn on EPIPE in client unless log_level is raised explicitly 2023-12-17 13:42:26 +03:00
Vitaliy Filippov 1a704e06ab Allow multiple interfaces with the same IP address, for "simple routed" full mesh network 2023-12-17 13:25:56 +03:00
Vitaliy Filippov 575475de71 Do not ignore loopback addresses for OSD network (to make ECMP setups with frr possible) 2023-12-17 11:55:13 +03:00
Vitaliy Filippov aca2bef15f Add vitastor-disk update-sb command 2023-12-14 01:11:42 +03:00
Vitaliy Filippov 4dd6e89263 Change qemu to qemu-system-x86 in docs 2023-12-14 01:01:00 +03:00
Vitaliy Filippov 9bac99ffb6 Fix incorrect error in CSI when searching for the device in /sys 2023-12-14 01:00:32 +03:00
Vitaliy Filippov 62ed130960 Support building qemu 8.1 from bookworm-backports 2023-12-10 00:34:13 +03:00
Vitaliy Filippov 9c7755b6e8 Use qemu-storage-daemon from QEMU 8.1.2 for CSI 2023-12-08 00:10:12 +03:00
Vitaliy Filippov 691ebd991a Move 2 last log printfs to stderr from stdout in etcd_state_client 2023-12-08 00:01:52 +03:00
Vitaliy Filippov 6d5df908a3 Fix possible out of bounds when checking invalid journal entries 2023-12-08 00:01:07 +03:00
Vitaliy Filippov fa87769ed8 Correct config options in vduse docs 2023-12-06 02:09:04 +03:00
Vitaliy Filippov 2ce8292803 Also log when killing process 2023-12-06 01:06:53 +03:00
Vitaliy Filippov 7f8f7ded52 Check for empty output of vitastor-nbd map (just in case) 2023-12-06 01:01:14 +03:00
Vitaliy Filippov 68553eabbb Log executed CLI commands 2023-12-06 00:48:12 +03:00
Vitaliy Filippov 3147c5c8d5 Remove internal error wrapping 2023-12-06 00:39:42 +03:00
Vitaliy Filippov 576e2ae608 Fix etcd_address check in CSI 2023-12-06 00:28:21 +03:00
Vitaliy Filippov a1c7cc3d8d Release 1.3.1
Hotfix to 1.3.0 - new "journal space reservation" had a bug which
caused OSDs to crash with EC and without immediate_commit.
2023-12-04 18:35:09 +03:00
Vitaliy Filippov a5e3dfbc5a Oops, 1.3.0 needs a hotfix 2023-12-04 13:45:54 +03:00
Vitaliy Filippov 7972502eaf Release 1.3.0
New features:
- RDMA without ODP - much faster and all cards are now supported, not just Mellanox
- VDUSE in CSI - faster, more stable and can even recover after CSI pod restart!
- Reserve journal space for stabilize requests dynamically to prevent stalls under load with EC
- Raise default NBD timeout from 30 to 300 seconds and allow to take it from /etc/vitastor/vitastor.conf
- Remove explicit etcdUrl/etcdPrefix K8S storage class parameter support to prevent
  etcd migration issues for volumes created with these parameters
- Support QEMU 8.1 and pve-qemu 8.1

Bug fixes:
- Fix RDMA connection (and thus memory) leak
- Fix rare crashes under load due to incorrect io_uring queue size tracking
- Fix monitor statistics aggregation in case of empty /osd/stats keys
- Fix crash on unknown long argument to vitastor-disk
- Allow trailing comma in JSONs again
- Fix crash on attempts to dump a long listing of objects "to stabilize" or "to rollback" in a slow op
2023-12-04 02:36:43 +03:00
Vitaliy Filippov e57b7203b8 Use cmake3 on RHEL 7 2023-12-04 02:36:29 +03:00
Vitaliy Filippov c8a179dcda Note that Proxmox 8.1 is supported 2023-12-04 02:20:33 +03:00
Vitaliy Filippov 845454742d Fix warning with QEMU 8.1 2023-12-04 01:59:07 +03:00
Vitaliy Filippov d65512bd80 Add patches for QEMU 8.1 2023-12-04 01:56:17 +03:00
Vitaliy Filippov 53de2bbd0f Support VDUSE in CSI
VDUSE has multiple advantages:
- Better performance
- Lack of timeout problems
- And even the ability to recover after restart of the vitastor-csi pod!
2023-12-04 00:41:24 +03:00
Vitaliy Filippov 628aa59574 Raise default NBD timeout from 30 to 300 seconds and allow to take it from /etc/vitastor/vitastor.conf 2023-12-02 14:11:14 +03:00
Vitaliy Filippov 037cf64a47 Remove explicit etcdUrl/etcdPrefix from volume parameters 2023-12-02 13:26:00 +03:00
Vitaliy Filippov 19e2d9d6fa Fix crash on unknown long argument to vitastor-disk 2023-12-01 00:55:51 +03:00
Vitaliy Filippov bfc7e61909 Add more notes + performance comparison about VDUSE 2023-11-25 02:25:56 +03:00
Vitaliy Filippov 7da4868b37 Fix monitor statistics aggregation in case of empty /osd/stats keys 2023-11-24 01:05:21 +03:00
Vitaliy Filippov b5c020ce0b Use io_uring SQ size for ringloop capacity - otherwise get_sqe could return NULL when space_left() was > 0 under load
Raise default io_uring size to 1024 for the same effective capacity as previously
2023-11-20 03:04:06 +03:00
Vitaliy Filippov 6b33ae973d %d -> %lu 2023-11-20 03:02:26 +03:00
Vitaliy Filippov cf36445359 Reserve journal space for stabilize requests dynamically to prevent stalls 2023-11-20 03:01:57 +03:00
Vitaliy Filippov 3fd873d263 Add -fno-omit-frame-pointer by default 2023-11-20 02:59:54 +03:00
Vitaliy Filippov a00e8ae9ed Fix mismatch journal pos format in vitastor-disk 2023-11-19 15:19:54 +03:00
Vitaliy Filippov 75674545dc Limit the number of printed object versions in slow op dump (otherwise it may overflow the fixed buffer) 2023-11-13 01:10:28 +03:00
Vitaliy Filippov 225eb2fe3d Support RDMA without ODP by stupidly copying memory. Disable ODP by default
ODP is slower than regular RDMA even with memory copy overhead

Example numbers:
- 3950000 random read iops without ODP vs 240000 iops with ODP
- 1447000 random write iops without ODP vs 101000 iops with ODP

Reference: https://tkygtr6.github.io/pub/ISPASS21_slides.pdf
2023-11-12 15:03:47 +03:00