1
0
Fork 0
Commit Graph

168 Commits (master)

Author SHA1 Message Date
Vitaliy Filippov c777a0041a Release 1.4.4
A couple of fixes for EC pools

- Fix a segfault possible on partial EC overwrite in 1234 -> 5030 rebalance scenario
- Fix two problems leading to EC pools stalling on rebalance & parallel sudden stops
  of OSDs, for example during a sudden poweroff of a host:
  - Recovery auto-tuning (1.4.0 feature) could apply too large delays and stall
    the EC journal - fixed by limiting delays with a new recovery_tune_sleep_cutoff_us
    parameter (10 seconds by default) and applying recovery pauses before write
    operations, not after them, to not occupy space in the journal for long time
  - Dynamic journal space reservation (1.3.0 feature) wasn't accounting new writes
    when checking the limit so OSDs could still fill the journal fully and stall -
    fixed by including new writes into the limit
- Print etcd dbSize instead of dbSizeInUse in status
2024-02-11 16:23:08 +03:00
Vitaliy Filippov 27e9f244ec Release 1.4.3
Hotfix for hotfix O:-)

- "Write stall fix" was incomplete and EC write stalls could
  continue even on 1.4.2. Now they're finally fixed O:-)
- Make monitor ignore statistics of stopped OSDs. Previously if you stopped all
  OSDs the last total I/O numbers would remain the same indefinitely
2024-02-09 00:29:31 +03:00
Vitaliy Filippov 8e25a28a08 Ignore down OSDs in monitor statistics aggregation 2024-02-09 00:22:36 +03:00
Vitaliy Filippov 016115c0d4 Release 1.4.2
- Log to systemd by default
- Fix excessive autosyncs after every operation with disabled immediate_commit (introduced in 1.1.0)
- Fix a possible write stall with EC due to the lack of OSD wakeup after stabilizing previous writes
- Change sync operation semantics as a final fix to possible write stalls with EC and disabled immediate_commit
- Sync after deleting data in CLI rm / rm-data if immediate_commit is disabled
- Fix OSDs ignoring syncs & autosyncs for delete operations
- Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)
- Speed up monitor failover - change default etcd_mon_ttl from 30 to 5 seconds
- Speed up operation retries - change default up_wait_retry_interval to 50 ms
- Add patch for libvirt 9.10
2024-02-04 02:23:49 +03:00
Vitaliy Filippov e026de95d5 Log to systemd by default 2024-02-04 01:21:31 +03:00
Vitaliy Filippov d2b43cb118 Change default etcd_mon_ttl 2024-01-29 23:45:19 +03:00
Vitaliy Filippov 1c322b33ed Change default up_wait_retry_interval to 50 ms 2024-01-26 01:51:08 +03:00
Vitaliy Filippov ba55f91409 Release 1.4.1
- Fix a monitor crash on primary OSD switching introduced in 1.4.0
- Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree
- Fix a realloc memory leak in theory possible with too large listings (OSD_OP_LIST)
2024-01-18 02:31:42 +03:00
Vitaliy Filippov 2aa5aa7ab6 Add a test for simple master switching without PG reconfiguration
Also use osd_out_time:1 only in select tests and restart mon in tests only on connection errors
2024-01-17 00:19:01 +03:00
Vitaliy Filippov 3ca3b8a8d8 Fix recheck_pgs bug introduced in 1.4.0 2024-01-16 23:49:21 +03:00
Vitaliy Filippov 5280d1d561 Release 1.4.0
New features:
- Intelligent recovery/rebalance speed auto-tuning to reduce its impact on clients (see README -> Features)
- Auto-restoration of dead VDUSE daemons in CSI plugin
- Add vitastor-disk update-sb command
- Update QEMU for Debian Bookworm to 8.1 and use it for CSI plugin

Bug fixes:
- Fix pools SOMETIMES staying inactive after stopping a node due to OSDs not reacting
  to PG state changes caused by incorrect full reload of state from etcd on reconnection
- Make monitors retry pool configuration changes quickier which fixes them being unable
  to apply changes when an ongoing rebalance is quickly making a lot of PGs clean
- Fix CSI plugin not accepting array of strings as etcd address in /etc/vitastor/vitastor.conf
- Allow multiple interfaces with the same IP address, for "simple routed" full mesh network
- Do not ignore loopback addresses for OSD network (to make ECMP setups with frr possible)
- Fix a rare client crash during OSD reconnections
- Only treat data partitions as existing OSDs in vitastor-disk prepare
- Remove etcd parameter from default command examples
- Fix reported free space sometimes changing non-immediately after deletion of data from OSDs
- Fix a possible OSD crash on print_slow when bs_op is NULL
- Use the same etcd_ws_keepalive_interval in mon as in OSD
- Fix mon not using values from config when /config/global is not present
- Remove pve-storage-portal-dns-list format for vitastor_etcd_address
- Parse log_level in cluster_client
- Fix vitastor-nbd image existence check not working because of non-zeroed inode_watch fields
- Do not warn on EPIPE in client unless log_level is raised explicitly
- Fix incorrect error in CSI when searching for the device in /sys
- Remove 2 last prints to stdout in etcd_state_client
- Fix a possible OSD crash when checking corrupted journal entries
2024-01-12 01:28:33 +03:00
Vitaliy Filippov 99ee8596ea Rename min/max_util to util_low/high 2023-12-31 01:23:17 +03:00
Vitaliy Filippov f757a35a8d Retry PG changes without re-running lpsolve when pool configuration and OSD tree don't change
OSDs often change their /pg/history keys during rebalance, so monitor receives additional
transaction failures from etcd if it re-runs lpsolve which sometimes may even lead to monitor
being unable to apply PG changes at all until rebalance completes
2023-12-31 01:23:17 +03:00
Vitaliy Filippov 1edf86ed26 Aggregate recovery delay using simple mean over last 10 observations (EWMA is shit) 2023-12-31 01:23:17 +03:00
Vitaliy Filippov 751935ddd8 WIP Auto-tune recovery speed 2023-12-31 01:23:17 +03:00
Vitaliy Filippov 1299373988 Use the same etcd_ws_keepalive_interval in OSD and mon 2023-12-23 20:07:29 +03:00
Vitaliy Filippov 4ece4dfdd0 Fix mon not using values from config when /config/global is not present 2023-12-22 02:25:09 +03:00
Vitaliy Filippov a1c7cc3d8d Release 1.3.1
Hotfix to 1.3.0 - new "journal space reservation" had a bug which
caused OSDs to crash with EC and without immediate_commit.
2023-12-04 18:35:09 +03:00
Vitaliy Filippov 7972502eaf Release 1.3.0
New features:
- RDMA without ODP - much faster and all cards are now supported, not just Mellanox
- VDUSE in CSI - faster, more stable and can even recover after CSI pod restart!
- Reserve journal space for stabilize requests dynamically to prevent stalls under load with EC
- Raise default NBD timeout from 30 to 300 seconds and allow to take it from /etc/vitastor/vitastor.conf
- Remove explicit etcdUrl/etcdPrefix K8S storage class parameter support to prevent
  etcd migration issues for volumes created with these parameters
- Support QEMU 8.1 and pve-qemu 8.1

Bug fixes:
- Fix RDMA connection (and thus memory) leak
- Fix rare crashes under load due to incorrect io_uring queue size tracking
- Fix monitor statistics aggregation in case of empty /osd/stats keys
- Fix crash on unknown long argument to vitastor-disk
- Allow trailing comma in JSONs again
- Fix crash on attempts to dump a long listing of objects "to stabilize" or "to rollback" in a slow op
2023-12-04 02:36:43 +03:00
Vitaliy Filippov 7da4868b37 Fix monitor statistics aggregation in case of empty /osd/stats keys 2023-11-24 01:05:21 +03:00
Vitaliy Filippov 5524dbdab7 Release 1.2.0
New features:

- Implement CSI volume expansion
- Implement CSI volume snapshots
- CSI driver now requires Kubernetes >= 1.20

Bug fixes:

- Important bug fix for EC: fix EC n+k, k>=2 read recovery in ISA-L version returning
  incorrect data when reading at least the second chunk out of multiple missing chunks
  without reading the first one. All users of EC n+k, k>=2 should upgrade as soon as
  possible, and upgrade should be conducted with downtime: first stop all clients
  (VMs/containers), then all OSDs, then upgrade and restart everything.
- Fix unstable statistics aggregation in monitor (affecting vitastor-cli status and df)
- Make udev not wait for OSDs to start during boot
- Do not report negative numbers of offline PGs in vitastor-cli status when changing PG count
- Report both old and new PG counts in vitastor-cli df when changing it
- Fix OSDs sometimes not starting with "The code only supports journal versions 1 and 2,
  but it is 2 on disk" error after upgrading from pre-1.0 versions and letting OSDs run
  for some time
- Fix monitors sometimes returning old PG count back after OSD configuration changes
- Make monitor PG changes more stable and timeout errors less probable
2023-11-05 01:48:57 +03:00
Vitaliy Filippov 0e888e6c60 Prevent spamming etcd with last_clean_pgs update requests 2023-11-05 00:12:00 +03:00
Vitaliy Filippov 408c21d8f0 Scale last_clean_pgs PG count even if current PGs already contain the new number of PGs 2023-11-04 23:45:59 +03:00
Vitaliy Filippov 43cb9ae212 Prevent multiple parallel recheck_pgs in case of timeouts 2023-11-04 20:59:56 +03:00
Vitaliy Filippov 1fe678e57b Add --no-block to udev rule 2023-10-30 12:18:29 +03:00
Vitaliy Filippov 2e592a2f22 Fix undefined variable "timeout" 2023-10-29 01:30:55 +03:00
Vitaliy Filippov b92f644e3a Fix statistics aggregation, calculate inode stats by first deriving per-OSD stats, too 2023-10-29 01:30:55 +03:00
Vitaliy Filippov 8222e3c77d Release 1.1.0
New features:

- Implement [client writeback cache](docs/config/client.en.md#client_enable_writeback)
- Add the third I/O mode: [O_DIRECT|O_SYNC](docs/config/osd.en.md#data_io) (good for Optane)
- Reduce load on etcd by splitting OSD lease and statistics reporting intervals:
  [etcd_stats_interval](docs/config/osd.en.md#etcd_stats_interval) (default 30 sec)
- Make MON automatically filter OSDs by layout (block_size/immediate_commit/bitmap_granularity)
  to prevent "refusing to start PGs of this pool" errors on misconfiguration
- Support running fio benchmarks on systems without io_uring
- Make QEMU driver compatible with QEMU 8.1
- Document usage of [vhost-user-blk](docs/usage/qemu.en.md#vhost-user-blk)

Bug fixes:

- Fix resizing disks in QEMU driver (for example, in Proxmox)
- Fix "unexpected result" in Proxmox driver by making CLI flush output on exit
- Remove unneeded block_size mismatch warnings on pools without matching PGs
- Fix possible segfault in vitastor-cli ls -l (usually with deleted pools)
- Fix QEMU driver compatibility with systems without io_uring
- Fix monitor eating 100% CPU when etcd is down (caused by infinite retries)
- Fix potential incorrect write processing with snapshots (not caught in tests
  but could probably lead to client hangs)
- Fix buffer insertion in cluster_client (not caught in tests but could
  probably lead to incorrect writes in rare cases)
- Fix rare OSD crash during sync operation processing
- Fix a reenterability issue in cluster_client not reproducible in QEMU/fio,
  but reproducible with the currently developed K/V database implementation
- Fix deletion of the first modified object - OSDs could crash if you modified
  the same object a lot of times, then deleted it, and then modified it again
- Fix the fio_sec_osd test tool
2023-10-28 00:33:06 +03:00
Vitaliy Filippov be7e76f849 Split etcd_stats_interval out of etcd_report_interval 2023-10-27 01:26:26 +03:00
Vitaliy Filippov 38db53f5ee Implement client writeback cache
- Disabled by default, enable with client_enable_writeback=true
- Even then only enabled in FIO when -direct is disabled and in QEMU when
  block device cache is enabled in settings
- Can also be enabled in other clients like vitastor-cli using parameter
  client_writeback_allowed=true, but not recommended
2023-09-16 17:52:17 +03:00
Vitaliy Filippov ff479a102d Make MON filter OSDs by block layout to prevent "refusing to start PGs of this pool" errors on misconfiguration 2023-09-16 17:52:17 +03:00
Vitaliy Filippov ab8627c9fa Fix monitor retrying failed etcd connection in an infinite loop without pauses 2023-08-09 00:57:08 +03:00
Vitaliy Filippov 25a15d24cf Fix incorrect EC space statistics in `vitastor-cli status` 2023-07-27 02:26:17 +00:00
Vitaliy Filippov 2f999d8607 Reduce etcd memory usage
With default --snapshot-count 100000 and GOGC=100 it easily reaches 6.6 GB
even when we only store 1-2 MB of data in it
2023-07-06 00:46:26 +03:00
Vitaliy Filippov d007a374f2 Delete extra /pool/stats/ keys for non-existing pools 2023-07-06 00:40:13 +03:00
Vitaliy Filippov f12b8e45a9 Remove /usr/local/bin path from make-etcd 2023-06-29 23:49:31 +03:00
Vitaliy Filippov a4186e20aa First derive, then sum per-OSD statistics instead of first summing and then deriving
This makes statistics reported by vitastor-cli status much smoother
2023-06-18 01:32:24 +03:00
Vitaliy Filippov aea567cfbd Slightly improve scrub docs 2023-05-21 12:52:30 +03:00
Vitaliy Filippov ce02f47de6 Allow to disable scrub_find_best 2023-05-21 12:33:38 +03:00
Vitaliy Filippov 8d40ad99a6 Add scrub documentation 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 3475772b07 Add configuration online update documentation 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 6648f6bb6e Implement ambiguity detection during scrub 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 3c924397e7 Store next scrub timestamp instead of last scrub timestamp 2023-05-20 23:19:39 +03:00
Vitaliy Filippov c3bd26193d Implement PG scrub runner 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0538a484b3 Add corrupted object state 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 022176aa98 Fix NaN during PG optimisation if there are nonexisting OSDs in node_placement 2023-05-17 01:20:30 +03:00
Vitaliy Filippov 120e3fa7bc Fix pool deletion 2023-05-17 00:45:59 +03:00
Vitaliy Filippov 6f4dc16c59 Handle etcd connection errors correctly in mon (unhandled error events) 2023-05-11 11:02:44 +03:00
Vitaliy Filippov 321cb435a6 Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number 2023-05-08 20:39:20 +03:00
Vitaliy Filippov 5b9031fecc Fix monitor possibly applying incorrect PG history under heavy load
Monitor could deceive itself by immediately saving PG configuration changes
which weren't applied to etcd yet in memory, and apply incorrect PG history
changes next time if the first update fails.

This usually only happened under heavy load and was caught in CI. :-)
2023-05-07 23:23:00 +03:00