Also add protection from etcd watcher messages being split into multiple websocket
messages - I'm not sure if etcd actually does that, but it's better to have extra
protection anyway.
Also check that all etcd watchers are started in the keepalive routine, otherwise
it sometimes tries to revive etcd watchers starting with revision=1 which obviously
always fails because this revision is nearly always compacted.
All these changes should fix an old rarely reproduced bug where SOMETIMES OSDs
didn't react to PG config changes which was leading to offline pools on node reboot.
It happened on the full reload of state from etcd.
New features:
- RDMA without ODP - much faster and all cards are now supported, not just Mellanox
- VDUSE in CSI - faster, more stable and can even recover after CSI pod restart!
- Reserve journal space for stabilize requests dynamically to prevent stalls under load with EC
- Raise default NBD timeout from 30 to 300 seconds and allow to take it from /etc/vitastor/vitastor.conf
- Remove explicit etcdUrl/etcdPrefix K8S storage class parameter support to prevent
etcd migration issues for volumes created with these parameters
- Support QEMU 8.1 and pve-qemu 8.1
Bug fixes:
- Fix RDMA connection (and thus memory) leak
- Fix rare crashes under load due to incorrect io_uring queue size tracking
- Fix monitor statistics aggregation in case of empty /osd/stats keys
- Fix crash on unknown long argument to vitastor-disk
- Allow trailing comma in JSONs again
- Fix crash on attempts to dump a long listing of objects "to stabilize" or "to rollback" in a slow op
VDUSE has multiple advantages:
- Better performance
- Lack of timeout problems
- And even the ability to recover after restart of the vitastor-csi pod!
ODP is slower than regular RDMA even with memory copy overhead
Example numbers:
- 3950000 random read iops without ODP vs 240000 iops with ODP
- 1447000 random write iops without ODP vs 101000 iops with ODP
Reference: https://tkygtr6.github.io/pub/ISPASS21_slides.pdf
New features:
- Implement CSI volume expansion
- Implement CSI volume snapshots
- CSI driver now requires Kubernetes >= 1.20
Bug fixes:
- Important bug fix for EC: fix EC n+k, k>=2 read recovery in ISA-L version returning
incorrect data when reading at least the second chunk out of multiple missing chunks
without reading the first one. All users of EC n+k, k>=2 should upgrade as soon as
possible, and upgrade should be conducted with downtime: first stop all clients
(VMs/containers), then all OSDs, then upgrade and restart everything.
- Fix unstable statistics aggregation in monitor (affecting vitastor-cli status and df)
- Make udev not wait for OSDs to start during boot
- Do not report negative numbers of offline PGs in vitastor-cli status when changing PG count
- Report both old and new PG counts in vitastor-cli df when changing it
- Fix OSDs sometimes not starting with "The code only supports journal versions 1 and 2,
but it is 2 on disk" error after upgrading from pre-1.0 versions and letting OSDs run
for some time
- Fix monitors sometimes returning old PG count back after OSD configuration changes
- Make monitor PG changes more stable and timeout errors less probable