Commit Graph

87 Commits (105a405b0a0dd6c55a684f2f6c9e89eb99585013)

Author SHA1 Message Date
Vitaliy Filippov 105a405b0a Implement vitastor-cli fix 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0e5d0e02a9 Add "vitastor-cli describe" command 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0439981a66 Implement "describe object(s)" operation
Required to implement fixing inconsistent objects in vitastor-cli
2023-05-20 23:19:39 +03:00
Vitaliy Filippov c3bd26193d Implement PG scrub runner 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 43b77d7619 Implement scrubbing "data path" - OSD_OP_SCRUB 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 5a9e1ede52 Release 0.8.9
Test / buildenv (push) Successful in 9s Details
Test / build (push) Successful in 2m31s Details
Test / test_cas (push) Successful in 12s Details
Test / make_test (push) Successful in 33s Details
Test / test_change_pg_size (push) Successful in 19s Details
Test / test_change_pg_count (push) Successful in 55s Details
Test / test_create_nomaxid (push) Successful in 21s Details
Test / test_change_pg_count_ec (push) Successful in 58s Details
Test / test_failure_domain (push) Successful in 13s Details
Test / test_etcd_fail (push) Successful in 1m4s Details
Test / test_interrupted_rebalance (push) Successful in 1m13s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m7s Details
Test / test_add_osd (push) Successful in 2m59s Details
Test / test_move_reappear (push) Successful in 24s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m22s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m1s Details
Test / test_rebalance_verify (push) Successful in 2m12s Details
Test / test_minsize_1 (push) Successful in 15s Details
Test / test_rebalance_verify_imm (push) Successful in 2m4s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m9s Details
Test / test_rm (push) Successful in 17s Details
Test / test_snapshot (push) Successful in 23s Details
Test / test_rebalance_verify_ec (push) Successful in 2m31s Details
Test / test_splitbrain (push) Successful in 23s Details
Test / test_snapshot_ec (push) Successful in 30s Details
Test / test_write_no_same (push) Successful in 16s Details
Test / test_write (push) Successful in 53s Details
Test / test_write_xor (push) Successful in 1m19s Details
Test / test_heal_pg_size_2 (push) Successful in 4m30s Details
Test / test_heal_ec (push) Successful in 4m32s Details
- The tests are now stable and run in a CI system based on Gitea CI
- The release includes final bug fixes for EC:
  - Implement missing EC recovery of allocation bitmap when built with ISA-L
  - Fix broken snapshot export with EC (allocation bitmap reads were giving incorrect results previously)
- Also fixed bugs manifesting under heavy load:
  - Fix monitor possibly applying incorrect PG history on retries
  - Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number
  - Allow writes to wait for free space again, but now correctly (previously dropped in 0.8.2)
  - Fix a rare segfault in client (handle client stop during incoming stream handling in 1 more place)
  - Make monitor correctly handle etcd connection errors - it could die instead of connecting to another etcd
  - Fix OSD rarely being unable to report PG states after a PG was taken over by another OSD
- Fixed return code for incomplete EC objects (now EIO) and made cluster client retry this error
- Made other small changes for tests: timeouts, nice/ionice for etcd, waiting conditions, NBD device checks and so on
2023-05-14 01:25:09 +03:00
Vitaliy Filippov ab615849d6 Release 0.8.8
- Fix vitastor-cli rm/rm-data broken in 0.8.6 (missing messenger initialization)
- Prepare OSD read handler for upcoming version with scrub - allow "secondary reads" to return errors
- Fix OSDs re-peering PGs infinitely with a big number of PGs (reproduced in test_add_osd)
- Fix another variant of flusher sync-waiting stall (reproduced in test_write)
- Fix other tests in tests/ (will add them to Gitea CI soon)
- Add patches for QEMU 6.2-8.0
- Fix QEMU driver compatibility with QEMU 8.0
- Build packages for RHEL 9 clones (based on AlmaLinux 9)
2023-04-28 11:22:00 +03:00
Vitaliy Filippov 7e958afeda Release 0.8.7
This release includes a bunch of important bugfixes for erasure-coded setups
with disabled immediate_commit. After these fixes, "test_heal" OSD killing test
now passes fine with EC:

- Fix cluster write stalls with "Error while doing flush on OSD xx: -16 (Device or resource busy)"
  in OSD logs possible in EC setups with disabled immediate_commit by selectively
  syncing nonsynced objects on STABILIZE/ROLLBACK (https://github.com/vitalif/vitastor/issues/51)
- Fix other EC + disabled immediate_commit problems:
  - Fix "opcode=5 retval=-2" errors happening on SYNC retries
  - Fix non-working "pagination" during PG dirty object flushing
  - Fix write operations not continued correctly after dirty object flushing
- Fix incorrect parity read-modify-write calculation when writing into a lost chunk
- Fix OSDs losing left_on_dead PG state of non-clean PGs and thus not removing junk data in the cluster
- Fix a small memory leak caused by bad indexing of EC recovery matrices
- Fix a rare use-after-free in cluster_client caused by a reenterability issue
- Fix vitastor-cli create command syntax in the CSI driver
- Allow to start OSDs without local store for tests
- Fix memory allocation error in disk_tool_meta for non-standard metadata block sizes
- Fix delete operations received before loading pool metadata crashing OSDs with "null pointer exception"
- Improve "theoretical performance" Russian documentation

New features:

- Implement online configuration update for some parameters. Documentation is coming soon :)
2023-04-11 02:11:57 +03:00
Vitaliy Filippov d81a6c04fc Update cmake min version so it does not complain about deprecation 2023-03-15 01:08:23 +03:00
Vitaliy Filippov 8810eae8fb Release 0.8.6
Important fixes:

- Fix possibly incorrect EC parity chunk updates with EC n+k, k > 1 and when
  the first parity chunk is missing

Minor fixes and improvements:

- Fix incorrect EC free space statistics in vitastor-cli df output
- Speedup vitastor-cli startup in clusters with RDMA
- Remove unused PG "peered" state (previously used to update PG epoch)
- Use sfdisk with just --json in vitastor-disk (--dump --json isn't needed)
- Allow trailing comma in sfdisk output (fixes sfdisk 2.36 compatibility)
- Slightly improve RDMA send/receive code
- Reduce RDMA memory consumption by default (rdma_max_recv/send = 16/8)
- Use vitastor-cli instead of direct etcd interaction in the CSI driver
2023-02-28 11:18:48 +03:00
Vitaliy Filippov 36a7dd3671 Move tests to "make test" 2023-02-21 01:30:42 +03:00
Vitaliy Filippov d125fb1f30 Release 0.8.5
- Fix a possible "double free" bug in the client library happening on OSD restart
- Fix a possible write hang on PG history update when only epoch is changed
- Fix incorrect systemd target "local.target" in mon/make-etcd
- Allow "content" option in PVE storage plugin to allow to enable containers
- Build client library without tcmalloc which fixes "attempt to free invalid pointer"
  errors when, for example, trying to run QEMU with both Vitastor and Ceph RBD disks
2023-01-25 01:43:49 +03:00
Vitaliy Filippov 9f4e34a8cc Build client library without tcmalloc
Fixes "[src/tcmalloc.cc:332] Attempt to free invalid pointer ..." when trying
to run QEMU with both Vitastor and Ceph RBD disks and other possible allocator
collisions.
2023-01-15 00:01:11 +03:00
Vitaliy Filippov 81fc8bb94c Release 0.8.4
New features:
- Implement QCOW2 image/snapshot export via qemu-img (bdrv_co_block_status in the driver)
- Remove OSDs from PG history during `vitastor-cli rm-osd` to prevent `left_on_dead` PG states after deletion
- Add a new recovery_pg_switch setting to mix all PGs during recovery, to almost
  fully reduce the probability of ENOSPC during rebalance
- Introduce partial ENOSPC ("OSD is full") handling - now ENOSPC doesn't turn
  into cascades of crashes
- Add migration support to Proxmox VE Vitastor driver
- Track last_clean_pgs on a per-pool basis thus reducing data movement in a cluster
  with pools remaining unclean/degraded for a long time

Bug fixes:
- Fix a bug where monitor could generate degraded PGs if one of the hosts had no OSDs
- Fix a bug where monitor could skip PG redistribution with a lot of OSDs in cluster
- Report PG history synchronously on the first write, which improves PG consistency
  and availability at the same time, because history now gets reported correctly
  and doesn't get reported without the need for it
- Fix possible write and recovery stalls which could happen in a cluster with both EC and replicated pools
- Make OSD and monitors sanitize & deduplicate PG history items in etcd
- Fix non-working OSD peer config safety check
- Fix a rare journal flush stall where flushing wasn't activated with full journal, but with empty flush queue
- Fix builds without ISA-L (jerasure-only) crashing with EC N+K, K>=2 due to the lack of 16-byte buffer alignment
- Fix a possible crash for EC N+K, K>=2 when calculating a parity chunk with previous parity chunk missing
- Fix a bug where vitastor-disk purge with suppressed warnings didn't work
2023-01-13 23:59:54 +03:00
Vitaliy Filippov 8d55a1e780 Build osd_rmw_test both with and without ISA-L 2022-12-29 19:13:57 +03:00
Vitaliy Filippov fa90f287da Release 0.8.3
- Implement a new "vitastor-disk purge" command to remove OSDs with safety checks
- Implement a new "vitastor-cli rm-osd" command to only remove OSD metadata from etcd
- Fix a bug where the monitor could ignore OSD removal and other /osd/stats key changes
- Fix a bug where garbage could be returned when reading objects being written at the same time
- Fix a rare write stall where journal space could be not reclaimed where there
  were no new operations in the flush queue
- Fix a rare peering stall caused by a previous long listing operations queues limiting attempt
- Fix total object count statistic in OSD on object creation
- Add missing offset&len into vitastor-disk dump-journal for big_writes, fix JSON format
- Make vitastor-cli print help on missing command
- Make vitastor-cli translate all '-' to '_' in CLI options
2022-12-27 02:40:55 +03:00
Vitaliy Filippov c2244331e6 Add vitastor-cli rm-osd command 2022-12-26 02:48:48 +03:00
Vitaliy Filippov 5ef8bed75f Release 0.8.2
- Fix QEMU driver compatibility with QEMU 7.0 and < 2.9
- Add patches for pve-qemu-kvm 7.1 (PVE 7.3) and pve-qemu-kvm 6.2 (PVE 7.2)
- Fix Proxmox driver location in the pve-storage-vitastor package
- Disable HDD autodetection in non-hybrid mode
- Explicitly warn about a buggy kernels on -EAGAIN in io_uring
- Final fix for the lack of zeroing out of old metadata entries
  (do not crash with "big_write journal_entry was allocated over another object"
  in some cases after an unclean OSD shutdown)
- Wait for data writes before fsyncing data if data fsync is enabled
- Never try to wait for free space inside blockstore thus stalling OSDs
- Fix a rare crash in osd_peering due to callback ordering
- Fix a rare duplication of ping & op message IDs
- Fix a rare use-after-free during pings
- Add --force to vitastor-disk read-sb
- Make vitastor-disk dump metadata object IDs in hex, add forgotten commas
- Fix vitastor-disk SCSI disk cache check
2022-12-17 17:54:13 +03:00
Vitaliy Filippov 8fdf30b21f Release 0.8.1
- Remove an additional data copy operation when flushing journal (should
  slightly increase write performance)
- Fix a bug where new writes in the inmemory_journal=false mode could overwrite
  the data currently read by a parallel read operation
- Fix degraded parity writes for EC N+K when K>1 where the bug could also lead
  to an "assertion failed" error
- Fix missing journal space check for "big" writes which could lead to
  "prefill_single_journal_entry(): assertion failed..." error in OSD
- Fix possible "assertion failed: next->prev_wait >= 0" in client in rare cases
- Fix missing "len" field in vitastor-disk write-journal big_writes
- Fix possible crash of a full OSD (ENOSPC)
- Fix CSI build scripts to include newest packages every time
- Fix CSI endpoint in the liveness probe manifest
2022-11-20 11:44:09 +03:00
Vitaliy Filippov 5953942042 Add crc32c test utility 2022-11-20 00:50:13 +03:00
Vitaliy Filippov 11ec9ad874 Release 0.8.0
- Implement automatic OSD activation via udev and simple on-disk superblock storage
- Add a new `vitastor-disk` tool and merge all disk-related functionality there.
  Now it can prepare new OSD disks, upgrade plain old systemd units to the new scheme,
  resize OSD data area, manage OSD services by disk paths, manage superblocks,
  automatically check and disable disk cache, dump and write back journal and metadata.
- Add a documentation section about `vitastor-disk` (read it if you want details!)
- Install systemd services during package installation instead of the older method
  of manually creating them via separate shell scripts
- Add a new `make-etcd` script that reuses /etc/vitastor/vitastor.conf to configure etcd
- Allow to configure block_size, bitmap_granularity and immediate_commit per-pool
- Fix "fatal error: tried to overwrite non-zero metadata entry" which was possible
  in some cases after unclean OSD shutdown (caused by old metadata entries not being zeroed)
2022-09-05 13:51:20 +03:00
Vitaliy Filippov 40d8d65188 Rewrite upgrade-simple to C++ 2022-08-18 01:31:31 +03:00
Vitaliy Filippov b1e39b5dea Split disk_tool.cpp into separate files 2022-08-14 02:37:01 +03:00
Vitaliy Filippov ae99ee6266 Rename base64.{cpp.h} to str_util 2022-07-31 01:12:37 +03:00
Vitaliy Filippov dcc6d546be Move simple-offsets into vitastor-disk, too 2022-07-15 02:19:35 +03:00
Vitaliy Filippov 0c404c5074 Use blockstore_disk in disk_tool 2022-07-15 01:38:30 +03:00
Vitaliy Filippov dfd80626bd Extract disk opening functions to separate module 2022-07-15 01:38:30 +03:00
Vitaliy Filippov 078ed5b116 WIP Data area resize tool 2022-07-15 01:38:30 +03:00
Vitaliy Filippov b0e86ca643 Merge dump-journal and dump-meta into the new "vitastor-disk" tool 2022-07-15 01:38:30 +03:00
Vitaliy Filippov bce357e2a5 Do not read all metadata into memory when dumping 2022-06-13 01:26:30 +03:00
Vitaliy Filippov dac12d8a4c Implement metadata dump tool 2022-06-10 18:50:09 +03:00
Vitaliy Filippov 101592bbff Release 0.7.1
- Add ISA-L erasure code implementation, now used automatically instead of jerasure when available
- Fix listings sending too many parallel requests to OSDs
- Fix rm-data crashing with --wait-list
- Remove empty inodes from statistics and `ls` output, after <inode_vanish_time> seconds after deletion
- Make monitor delete pool statistics when the pool is deleted and thus remove them from `df` output
- Log multiple etcd addresses in OSD logs correctly
- Fix true/false parsing in json configs like no_recovery/no_rebalance
- Show no_recovery, no_rebalance, readonly flags in status
2022-06-05 00:07:24 +03:00
Vitaliy Filippov 87613ed590 Add ISA-L into RPM specs 2022-06-04 13:27:06 +03:00
Vitaliy Filippov 21b306e25f Add ISA-L support 2022-06-02 01:47:33 +03:00
Vitaliy Filippov d8313e939a Release 0.7.0
- Add documentation! :-) in Russian and English
- Implement an NFS proxy for file-based access emulation to Vitastor
  images for non-QEMU based hypervisors like VMWare, as a better way
  than iSCSI
- Implement "primary affinity tags"
- Add a patch for libvirt 6.0
- Fix free_down_raw in cli status
- Fix a rare bug where OSDs could drop unrelated connections on errors
2022-05-29 23:39:53 +03:00
Vitaliy Filippov acf403e886 Add install target for NFS proxy 2022-05-10 10:43:17 +03:00
Vitaliy Filippov 7c2379d458 Simplified NFS proxy based on own NFS/XDR implementation 2022-05-07 01:01:20 +03:00
Vitaliy Filippov a2189100dd Make CLI functions usable in library form
Return results and errors in a variable instead of just printing them,
separate vitastor-cli main() from cli_tool_t, move positional argument
parsing to CLI main from command implementations.
2022-05-06 02:18:32 +03:00
Vitaliy Filippov bb84379db6 Release 0.6.17
- Fix incorrect reading of extra metadata block leading to extra unknown objects in stats
- Fix CSI driver volumeMode: Block support
- Add block PVC and pod examples
- Fix build under 32 bit architectures
- Fix slow connection ramp-up caused by up_wait_retry_interval
2022-05-06 02:18:01 +03:00
Vitaliy Filippov 340a4b4f27 Release 0.6.16
- Implement `vitastor-cli status` (print cluster status) command
- Add a new `make-osd-hybrid.js` script to quickly prepare a lot of hybrid (HDD+SSD) OSDs
- Implement snapshot deletion for Cinder driver (only works in a healthy cluster)
- Fix a huge :) bug causing reads to return all zeroes during rebalance. Add a test to prevent it in the future
- Disconnect NBD proxy correctly without leaving a zombie [vitastor-nbd] process in D state
- Fix a rare write hang appearing with small write throttling enabled
2022-04-09 01:16:52 +03:00
Vitaliy Filippov d71cc174e3 Implement CLI status command 2022-04-09 00:25:51 +03:00
Vitaliy Filippov 85298ddae2 Release 0.6.15
- Make peering much faster in medium to large clusters
- Fix a reenterability issue which could rarely lead to peering process hangs
2022-03-06 19:16:34 +03:00
Vitaliy Filippov e23296a327 Rename cli_rm -> cli_rm_data, cli_snap_rm -> cli_rm 2022-02-24 14:34:14 +03:00
Vitaliy Filippov 117d6f0612 Release 0.6.14
- Fix IPv6 address parsing
- Fix "cannot read bytes of undefined" in the monitor on a fresh DB
- Fix possible hangs of read requests on OSD restarts without immediate_commit=all mode
- Fix OSDs skipping misplaced recovery in some cases
- Fix OSDs possibly dying with "map::at" errors when other OSDs are stopped
- Fix division by zero in ls if all pool OSDs are down
2022-02-17 14:43:44 +03:00
Vitaliy Filippov 36f352f06f Release 0.6.13
- Fix client hangs possible on OSD restarts (bug affected versions from 0.5.11)
- Fix "Assertion `sqe != NULL' failed" io_uring-related crashes possible
  on some kernels (0.6.11 increased probability of this bug)
- Fix timeout=0 in NBD proxy
- Fix build under centos 7
2022-02-03 01:50:30 +03:00
Vitaliy Filippov c1929cabe0 Release 0.6.12
etcd connection stability, clang & elbrus support

- Fix build under CLang and Elbrus LCC compilers, making Vitastor compatible
  with Elbrus CPUs :)
- Completely fix the bug where OSDs didn't connect to peers and incorrectly marked
  PGs as incomplete
- Limit I/O depth for deletes the same way as for small writes. Makes OSD crashes
  with "Assertion failed: sqe != NULL" during image deletion go away
- Fix a very old, but rare, journaling bug (credits to https://github.com/mirrorll)
- Fix flushing of unclean journaled objects leading to OSDs sometimes hanging
  after failover in EC setups (bug was introduced in 0.6.7)
- Fix several problems that could prevent smooth operation of a Vitastor cluster
  under the condition of partial etcd failure:
  - OSDs could randomly fail due to too strict error handling
  - New clients and OSDs could be unable to start because of the lack of retries
  - CLI could fail some commands because of the lack of retries
  - Monitor could stop receiving state updates because of the lack of websocket pings
- Fix monitor being unable to rebalance PGs after a downscale of pool pg_size (3->2)
- Exit with failure when trying to nbd map or benchmark a non-existing image
- Use HTTP keep-alive for etcd connections
- Allow to configure etcd request timeouts and retries
- Allow to configure NBD timeout, max devices and partitions, and set default to
  up to 64 devices with up to 3 partitions each
2022-01-24 01:15:25 +03:00
Vitaliy Filippov 0785bdf8b3 Release 0.6.11
- Slightly reduce journaling write amplification (requires no_same_sector_overwrites=false)
- Fix listen_backlog (it was 0) because it could more than halve OSD socket send speed
- Support IPv6 OSD addresses
- Do not try to initialize client in simple-offsets
- Fix OSDs sometimes marking PGs incomplete instead of trying to connect with peers
- Allow to configure OSD placement in node_placement
- Allow to run with 4k sector size block devices. Natural, but it was forbidden
2021-12-26 21:11:24 +03:00
Vitaliy Filippov 20a4406acc Support IPv6 OSD addresses 2021-12-19 10:42:17 +03:00
Vitaliy Filippov 2020608a39 Release 0.6.10
- Implement a storage plugin for Proxmox. Now you can use Vitastor with Proxmox!
- Implement `vitastor-cli df` (pool space usage statistics) command
- Add glob pattern support for `vitastor-cli ls`
- Fix several bugs in other CLI commands (resize, create --parent, modify --readonly)
- Use 512 byte logical block size in QEMU driver by default (and thus don't require to set it in QEMU options)
2021-12-10 21:40:12 +03:00
Vitaliy Filippov 0ee5e0a7fe Implement vitastor-cli df command 2021-12-10 02:37:02 +03:00