ODP is slower than regular RDMA even with memory copy overhead
Example numbers:
- 3950000 random read iops without ODP vs 240000 iops with ODP
- 1447000 random write iops without ODP vs 101000 iops with ODP
Reference: https://tkygtr6.github.io/pub/ISPASS21_slides.pdf
New features:
- Implement CSI volume expansion
- Implement CSI volume snapshots
- CSI driver now requires Kubernetes >= 1.20
Bug fixes:
- Important bug fix for EC: fix EC n+k, k>=2 read recovery in ISA-L version returning
incorrect data when reading at least the second chunk out of multiple missing chunks
without reading the first one. All users of EC n+k, k>=2 should upgrade as soon as
possible, and upgrade should be conducted with downtime: first stop all clients
(VMs/containers), then all OSDs, then upgrade and restart everything.
- Fix unstable statistics aggregation in monitor (affecting vitastor-cli status and df)
- Make udev not wait for OSDs to start during boot
- Do not report negative numbers of offline PGs in vitastor-cli status when changing PG count
- Report both old and new PG counts in vitastor-cli df when changing it
- Fix OSDs sometimes not starting with "The code only supports journal versions 1 and 2,
but it is 2 on disk" error after upgrading from pre-1.0 versions and letting OSDs run
for some time
- Fix monitors sometimes returning old PG count back after OSD configuration changes
- Make monitor PG changes more stable and timeout errors less probable
New features:
- Implement [client writeback cache](docs/config/client.en.md#client_enable_writeback)
- Add the third I/O mode: [O_DIRECT|O_SYNC](docs/config/osd.en.md#data_io) (good for Optane)
- Reduce load on etcd by splitting OSD lease and statistics reporting intervals:
[etcd_stats_interval](docs/config/osd.en.md#etcd_stats_interval) (default 30 sec)
- Make MON automatically filter OSDs by layout (block_size/immediate_commit/bitmap_granularity)
to prevent "refusing to start PGs of this pool" errors on misconfiguration
- Support running fio benchmarks on systems without io_uring
- Make QEMU driver compatible with QEMU 8.1
- Document usage of [vhost-user-blk](docs/usage/qemu.en.md#vhost-user-blk)
Bug fixes:
- Fix resizing disks in QEMU driver (for example, in Proxmox)
- Fix "unexpected result" in Proxmox driver by making CLI flush output on exit
- Remove unneeded block_size mismatch warnings on pools without matching PGs
- Fix possible segfault in vitastor-cli ls -l (usually with deleted pools)
- Fix QEMU driver compatibility with systems without io_uring
- Fix monitor eating 100% CPU when etcd is down (caused by infinite retries)
- Fix potential incorrect write processing with snapshots (not caught in tests
but could probably lead to client hangs)
- Fix buffer insertion in cluster_client (not caught in tests but could
probably lead to incorrect writes in rare cases)
- Fix rare OSD crash during sync operation processing
- Fix a reenterability issue in cluster_client not reproducible in QEMU/fio,
but reproducible with the currently developed K/V database implementation
- Fix deletion of the first modified object - OSDs could crash if you modified
the same object a lot of times, then deleted it, and then modified it again
- Fix the fio_sec_osd test tool
- Disabled by default, enable with client_enable_writeback=true
- Even then only enabled in FIO when -direct is disabled and in QEMU when
block device cache is enabled in settings
- Can also be enabled in other clients like vitastor-cli using parameter
client_writeback_allowed=true, but not recommended
New features:
- Data and metadata checksums!
- Metadata checksums are always used with new disk format
- Data checksums can be turned on with --data_csum_type crc32c for new OSDs
- Checksum block size can be configured
- inmemory_metadata now also affects keeping checksums in memory
- Linux page cache I/O caching support which can be enabled separately for
data, metadata (including checksums) and journal (O_SYNC instead of O_DIRECT)
- Details [here](https://git.yourcmc.ru/vitalif/vitastor/src/branch/master/docs/config/layout-osd.en.md#data_csum_type)
- Backwards compatibility is preserved, you can use new OSDs with old disks
Release also includes bug fixes from [0.9.6](https://git.yourcmc.ru/vitalif/vitastor/releases/tag/v0.9.6).
0.9.6 is moved to "-oldstable" repositories and will be available for some additional time.
Instead of it, just do not verify checksums of currently mutated objects.
When clean data modification during flush runs in parallel to a read request,
that request may read a mix of old and new data. It may even read a mix of
multiple flushed versions if it lasts too long... And attempts to verify it
using temporary copies of metadata make the algorithm too complex and creepy.
- Fix vitastor-disk partition zeroing (sometimes it was writing garbage instead of zeroes)
- Fix incorrect EC space statistics in `vitastor-cli status`
- Several bug fixes for NFS:
- Add . and .. in NFS directory listings
- Return FILE_SYNC from NFS writes if immediate_commit is enabled
- Return the same "verifier" in NFS COMMIT as in NFS WRITE
- Make parallel NFS extending writes work correctly, without conflicts
- Handle parallel NFS extending writes without imposing extra load on etcd
- Support UTF-8 in vitastor-cli table output
- Also allow "0" and "no" as false for inmemory_metadata and inmemory_journal
- Use HDD defaults for HDD-only in automatic `vitastor-disk prepare` mode
This fixes buffered (not O_DIRECT) NFS writes in Linux - previously they were
hanging in an infinite loop because COMMIT didn't return the same verifier as
previous WRITEs, and NFS kernel client was infinitely retrying the same writes.
Also this probably allows for correct NFS failover, at least for the same
buffered writes, because NFS clients repeat all write requests until a COMMIT
confirms them.
Previously, multiple parallel writes extending file size through NFS were
racing with each other and triggering deletions of part of the written data
I.e. if you mounted vitastor-nfs and just copied a file into it in MC then
you could end up with only a part of the file actually written