1
0
Fork 0
Commit Graph

688 Commits (recovery-autotune)

Author SHA1 Message Date
Vitaliy Filippov 27d0d5b06a Reads do not have to wait for buffer flushes anymore 2023-09-16 17:52:17 +03:00
Vitaliy Filippov 33950c1ec8 Fix fio_sec_osd attr_len 2023-09-16 17:49:10 +03:00
Vitaliy Filippov cc0fdc6253 Remove erroneous block_size mismatch warnings on pools without matching PGs 2023-09-08 23:19:04 +03:00
Vitaliy Filippov 79ecd59b10 Flush STDOUT and STDERR before exiting from cli to fix Proxmox "Unexpected result" 2023-09-07 17:30:26 +03:00
Vitaliy Filippov b7d398be5b Fix sscanf validation usage (field count instead of null_byte == 0) 2023-09-07 02:34:35 +03:00
Vitaliy Filippov 85e9f67d9d Add supported_truncate_flags 2023-09-06 17:37:52 +03:00
Vitaliy Filippov 79c6d6f323 Make QEMU driver compatible with QEMU 8.1 2023-08-24 02:23:55 +03:00
Vitaliy Filippov ae760dbc1d Fix co_truncate size division by BDRV_SECTOR_SIZE 2023-08-24 01:55:35 +03:00
Vitaliy Filippov 65487da4b1 Do not include msgr_rdma.h into messenger.h 2023-08-24 01:55:35 +03:00
Vitaliy Filippov 7862282938 Extract validation to check_rw(), remove duplicate code with OP_SYNC 2023-08-13 23:49:52 +03:00
Vitaliy Filippov 30ce2bd951 Fix buffer insert in cluster_client 2023-08-12 11:08:50 +03:00
Vitaliy Filippov b1a0afd10a Aggregate buffer flushes 2023-08-11 11:26:13 +03:00
Vitaliy Filippov 85b6134910 Return dirty buffers on read in client
Required at least to return buffers when they need to be replayed, but until
they are actually replayed
2023-08-09 00:57:08 +03:00
Vitaliy Filippov b1b07a393d Fix incorrect marking op parts as done with snapshots (could probably lead to client hangs) 2023-08-09 00:57:08 +03:00
Vitaliy Filippov 7333022adf Add a third I/O mode: O_DIRECT|O_SYNC, change parameters to data_io/meta_io/journal_io 2023-08-09 00:57:08 +03:00
Vitaliy Filippov 6acf562e01 Release 1.0.0
New features:

- Data and metadata checksums!
  - Metadata checksums are always used with new disk format
  - Data checksums can be turned on with --data_csum_type crc32c for new OSDs
  - Checksum block size can be configured
  - inmemory_metadata now also affects keeping checksums in memory
- Linux page cache I/O caching support which can be enabled separately for
  data, metadata (including checksums) and journal (O_SYNC instead of O_DIRECT)
- Details [here](https://git.yourcmc.ru/vitalif/vitastor/src/branch/master/docs/config/layout-osd.en.md#data_csum_type)
- Backwards compatibility is preserved, you can use new OSDs with old disks

Release also includes bug fixes from [0.9.6](https://git.yourcmc.ru/vitalif/vitastor/releases/tag/v0.9.6).

0.9.6 is moved to "-oldstable" repositories and will be available for some additional time.
2023-07-29 18:57:19 +03:00
Vitaliy Filippov 564df2eb5d Support using buffered I/O with O_SYNC instead of direct I/O 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 1a4ceb420d Track used blocks, not object versions 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 21b5124a4b Document data_csum_type and csum_block_size parameters 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 4181add1f4 Remove creepy "metadata copying" during overwrite
Instead of it, just do not verify checksums of currently mutated objects.
When clean data modification during flush runs in parallel to a read request,
that request may read a mix of old and new data. It may even read a mix of
multiple flushed versions if it lasts too long... And attempts to verify it
using temporary copies of metadata make the algorithm too complex and creepy.
2023-07-29 12:17:18 +03:00
Vitaliy Filippov a8464c19af Support keeping checksums on disk (not in memory)
Definitely beneficial for SSD+HDD setups
2023-07-29 12:17:18 +03:00
Vitaliy Filippov 3c8e4c6b72 Use clean_dyn_size for space check 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 8ef4cf89dc Log more details about checksum mismatch in big_writes 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 7bfb1639ea Use find_holes() in flusher for unification 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 628e481c32 Fill journal header to know checksum type & size when dumping journal with --all 2023-07-29 12:17:18 +03:00
Vitaliy Filippov af6f2046fc Fix journal read checksum verification with inmemory_journal=false 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 9357e5293e Call fill_partial_checksum_blocks() correctly in regard to COPY_BUF_CSUM_FILL 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 12851dc07d Wait for journal reads before checking them in clear_incomplete_csum_block_bits 2023-07-29 12:17:18 +03:00
Vitaliy Filippov d6ee1ca17c Use zero checksum size for zero-length writes 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 71674d00cf Fix journal data checksum mangling on corrupted block overwrite 2023-07-29 12:17:18 +03:00
Vitaliy Filippov ddb078d5a7 Check journal entry size when checking block checksums 2023-07-29 12:17:18 +03:00
Vitaliy Filippov d22d56f90a Fix journal data checksum verification on start 2023-07-29 12:17:18 +03:00
Vitaliy Filippov eb1331a079 Add more details to "journal entry data is corrupt" messages 2023-07-29 12:17:18 +03:00
Vitaliy Filippov c5274f655b ...and partially remove the perversion with bitmap inlining 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 45e07d6294 Sadly we have to refcount dyn_data... 2023-07-29 12:17:18 +03:00
Vitaliy Filippov a8ee391e05 Fix clean block checksum read 2023-07-29 12:17:18 +03:00
Vitaliy Filippov de48fa3fd2 Allow to forcibly set meta_format 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 874a766b62 Rename meta_version to meta_format 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 384bd8e28f Support old metadata format in vitastor-disk dump-meta 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 430994f48a Fix journal big_write simple reads after checksum changes 2023-07-29 12:17:18 +03:00
Vitaliy Filippov b909d81f41 Fix bitmap-granular checksums 2023-07-29 12:17:18 +03:00
Vitaliy Filippov e42975ffd1 Fix wait_journal_count not being zeroed 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 93778324e5 Rewrite and fix find_holes into a more obvious version 2023-07-29 12:17:18 +03:00
Vitaliy Filippov eeb6727170 Fix missing checksum read offset 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 92c6e16eba Fix checksum verification in big_write journal reads 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 213a9ccb4d Verify checksums during journal reads 2023-07-29 12:17:18 +03:00
Vitaliy Filippov a166147110 Add backwards compatibility with non-checksum metadata and journal formats 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 7d532880c3 Implement large csum_block_size support (more than 4k) + refactor blockstore_flush 2023-07-29 12:17:18 +03:00
Vitaliy Filippov 0b0405d115 Implement bitmap-granular (4k) metadata & data checksums 2023-07-29 12:17:18 +03:00
Vitaliy Filippov e651c93a90 Release 0.9.6
- Fix vitastor-disk partition zeroing (sometimes it was writing garbage instead of zeroes)
- Fix incorrect EC space statistics in `vitastor-cli status`
- Several bug fixes for NFS:
  - Add . and .. in NFS directory listings
  - Return FILE_SYNC from NFS writes if immediate_commit is enabled
  - Return the same "verifier" in NFS COMMIT as in NFS WRITE
  - Make parallel NFS extending writes work correctly, without conflicts
  - Handle parallel NFS extending writes without imposing extra load on etcd
- Support UTF-8 in vitastor-cli table output
- Also allow "0" and "no" as false for inmemory_metadata and inmemory_journal
- Use HDD defaults for HDD-only in automatic `vitastor-disk prepare` mode
2023-07-29 10:54:00 +03:00
Vitaliy Filippov 988e90be69 Fix vitastor-disk partition zeroing (it was writing random garbage instead of zeroes :D) 2023-07-28 12:29:07 +03:00
Vitaliy Filippov 700e0e9bff Handle parallel NFS extending writes without imposing extra load on etcd 2023-07-27 02:26:17 +00:00
Vitaliy Filippov ab0ca7c00f Return FILE_SYNC from NFS writes if immediate_commit is enabled 2023-07-26 02:09:47 +03:00
Vitaliy Filippov f153bc950b Return the same "verifier" in NFS COMMIT as in NFS WRITE
This fixes buffered (not O_DIRECT) NFS writes in Linux - previously they were
hanging in an infinite loop because COMMIT didn't return the same verifier as
previous WRITEs, and NFS kernel client was infinitely retrying the same writes.

Also this probably allows for correct NFS failover, at least for the same
buffered writes, because NFS clients repeat all write requests until a COMMIT
confirms them.
2023-07-26 02:09:47 +03:00
Vitaliy Filippov 425ff8818d Add . and .. in NFS directory listings
MC, for example, hangs with infinite listing retries without them
2023-07-26 02:09:47 +03:00
Vitaliy Filippov 9e287a7778 Handle extending writes correctly in NFS proxy
Previously, multiple parallel writes extending file size through NFS were
racing with each other and triggering deletions of part of the written data

I.e. if you mounted vitastor-nfs and just copied a file into it in MC then
you could end up with only a part of the file actually written
2023-07-26 02:09:43 +03:00
Vitaliy Filippov f52f58b9e9 Support UTF-8 in vitastor-cli table output 2023-07-25 01:48:57 +00:00
Vitaliy Filippov 1fe6b0c0e2 Also allow "0" and "no" as false for inmemory_metadata and inmemory_journal 2023-07-25 01:48:57 +00:00
Vitaliy Filippov e4237e9ed8 Enable HDD defaults for HDD-only in automatic `vitastor-disk prepare` mode 2023-07-23 02:33:22 +03:00
Vitaliy Filippov 10a5fd6abb Release 0.9.5
A hotfix to 0.9.4 containing only one bugfix: 100% CPU usage in the new QEMU
driver caused by the lack of eventfd reset on io_uring event handling :)
2023-07-21 00:04:41 +03:00
Vitaliy Filippov 1c316ef350 Reset eventfd on every ringloop::loop() 2023-07-21 00:04:41 +03:00
Vitaliy Filippov 0b2d12eef1 Remove has_work, it was unnecessary 2023-07-21 00:04:37 +03:00
Vitaliy Filippov 1c10430ae1 Release 0.9.4
- Improve QEMU driver performance by integrating io_uring in it (up to 1.5x total iops improvement)
- Fix QEMU driver deadlocks which started to reproduce in qemu-img after iothread fixes
- Fix `vitastor-cli status` reporting more etcds than actually exists (fix etcd address duplication in config on reload)
- Fix `vitastor-cli ls` crashing on inodes in non-existing pools
- Delete old garbage /pool/stats/ keys for non-existing (deleted) pools
- Reduce memory usage of etcds initialized by make-etcd script
- Fix OSDs almost always crashing on etcd restart due to "revisions were compacted" (support reloading state from etcd)
- Fix a crash and a stall possible mostly in HDD setups with small journal and big (512k, 900k) random writes
- Add notes about HDDs to documentation. You are officially allowed to use HDD-only Vitastor with HGST/Toshiba/EXOS :)
2023-07-19 02:50:30 +03:00
Vitaliy Filippov d0e257ee81 Fix non-existing pool handling in `vitastor-cli ls` 2023-07-18 23:52:02 +03:00
Vitaliy Filippov 9815d70ffc It is impossible to use io_uring with older vitastor-client because it does not have vitastor_c_uring_has_work() 2023-07-18 23:37:53 +03:00
Vitaliy Filippov 4a4627dcab Do not use bool in C library 2023-07-18 23:37:53 +03:00
Vitaliy Filippov ba7427020e Fix deadlocks possible in qemu-img after fixing iothread
Deadlock was caused by switching QEMU coroutines directly inside
vitastor_co_read_bitmap_cb() callback. The correct way is to schedule a BH
/BH is a QEMU term for setImmediate() :)/, same as in read and write callbacks.
2023-07-18 23:32:16 +03:00
Vitaliy Filippov ac7b834af3 Disable journal_no_same_sector_overwrites by default for HDD-only 2023-07-10 00:34:35 +03:00
Vitaliy Filippov 57ad4c3636 Add a note about HDD, enable throttling only for hybrid OSDs 2023-07-09 12:45:11 +03:00
Vitaliy Filippov b7e4d0c9bf Fix journal dirty_start position tracking and some debug prints
Fixes two bugs found during HDD testing :-)
1) OSD crashed with "BUG: Attempt to overwrite used offset of the journal" during
   `fio -bs=900k -iodepth=128` test with 16 MB journal
2) OSD stalled during `fio -bs=512k -iodepth=128` test with 64 MB journal
2023-07-09 01:17:55 +03:00
Vitaliy Filippov 161a23c966 Support reloading state when etcd says "revisions were compacted"
Before this change, OSDs almost always died when one of the etcds was restarted,
even though the rest of them was still in quorum and the lease was still active
2023-07-07 01:33:48 +03:00
Vitaliy Filippov 45c0694853 Clear etcd_local addresses on reload and also skip duplicates 2023-07-06 00:39:39 +03:00
Vitaliy Filippov 30ac899074 Make QEMU driver compatible with older vitastor_client and with systems without io_uring 2023-07-04 15:51:43 +03:00
Vitaliy Filippov 2348d39cf4 Avoid repeated qemu_uring_handlers, add 2.0-2.7 compatibility 2023-07-04 00:28:23 +03:00
Vitaliy Filippov 3de7929fe5 Integrate v2 - direct epoll 2023-07-04 00:28:23 +03:00
Vitaliy Filippov 07b2196bc2 Integrate QEMU driver with io_uring 2023-07-04 00:28:23 +03:00
Vitaliy Filippov a612cdca47 Release 0.9.3
- Add patch for libvirt 9.0
- Add support for Proxmox VE 8.0
- Fix compatibility of the QEMU driver with iothread (QEMU rebuilds are coming)
- Fix vitastor-cli rm-data/rm/merge hanging when some OSDs are down.
  Allow deletions in unclean cluster at the cost of some data possibly
  "reappearing" when those OSDs start back. In that case you can just repeat
  the deletion request using rm-data.
- A bunch of bug fixes for snapshots:
  - Fix snapshot reads often not working at all with snapshot chain size > 2
  - Fix optimized snapshot data merge (children to parent)
  - Fix updating of image name index key during optimized merge
  - Fix auto-selection preventing the use of optimized merge with only 1 snapshot
  - Fix incorrect CAS retries during snapshot merge
  - Fix snapshot merge progress reporting
- Fix primary_read bitmap buffers use-after-free which could lead to
  incorrect allocation map reads
- Remove /usr/local/bin path from make-etcd
- Some documentation fixes
2023-07-01 00:25:58 +03:00
Vitaliy Filippov c8d61568b5 Fix primary_read bitmap buffers being freed too early (use-after-free) 2023-06-30 12:47:45 +03:00
Vitaliy Filippov 84ed3c6395 Fix CAS retries during snapshot merge 2023-06-30 02:30:23 +03:00
Vitaliy Filippov a7b57386c0 Do not print last subcommand result twice during "inverse" snapshot merge 2023-06-30 02:07:10 +03:00
Vitaliy Filippov 9d4ea5f764 Fix inverse parent selection which prevented the use of optimized merge in case of only 1 snapshot 2023-06-30 01:39:11 +03:00
Vitaliy Filippov 000e4944ec Remove "inverse parent" image name index key from etcd during snapshot merge 2023-06-30 01:23:30 +03:00
Vitaliy Filippov 8426616d89 Warn about unfinished deletions in rm-data 2023-06-30 01:18:25 +03:00
Vitaliy Filippov 1a841344ec Print progress of all operations during snapshot merge 2023-06-30 01:13:47 +03:00
Vitaliy Filippov 8603b5cb1d Do not hang on inactive OSDs during delete, report and skip them instead 2023-06-30 00:15:16 +03:00
Vitaliy Filippov 878ccbb6ea Fix snapshot chain "down-merge" ("up-merge" worked well...) 2023-06-29 00:47:21 +03:00
Vitaliy Filippov 63c2b9832c Fix chained (snapshot) reads often not working at all with chain size > 2 2023-06-28 18:54:03 +03:00
Vitaliy Filippov a11ca56fb1 Fix compatibility of the QEMU driver with iothread 2023-06-21 02:11:28 +03:00
Vitaliy Filippov b84927b340 Fix \n in nbd_proxy 2023-06-19 01:48:58 +03:00
Vitaliy Filippov 926be372fd Release 0.9.2
- Measure and report scrub I/O statistics in vitastor-cli status
- Make aggregated statistics in vitastor-cli status much smoother
  (first derive, then sum instead of first summing and then deriving)
- Fix an old rare bug leading to journal corruption
  (try to use scrub if you think you're affected...)
- Do not start EC PGs without at least <data chunks> OSDs in each old set
  (prevents spurious read errors with EC during reconnections/restarts)
- Fix failed assert(!scrub_list_op) on OSD restart with pending scrubs
- Fix future planned scrubs not starting because of incorrect time comparison
- Build packages for Debian 12 (Bookworm)
2023-06-18 19:44:33 +03:00
Vitaliy Filippov c74a424930 Report scrub I/O in vitastor-cli status 2023-06-17 21:11:21 +03:00
Vitaliy Filippov 32f2c4dd27 Measure scrub statistics 2023-06-17 20:56:26 +03:00
Vitaliy Filippov 3ad16b9a1a Fix auto_scrubs not starting because of < vs <= =)) 2023-06-17 17:32:21 +03:00
Vitaliy Filippov 1c2df841c2 Fix failed assert(!scrub_list_op) on OSD restart with pending scrubs 2023-06-17 17:02:54 +03:00
Vitaliy Filippov aa5dacc7a9 Do not start EC PGs without at least pg_data_size connections to old OSDs from each set 2023-06-17 02:16:30 +03:00
Vitaliy Filippov 4fdc49bdc7 Add another assert-type check (it does not fire, just as a safety measure for the future) 2023-06-17 00:07:22 +03:00
Vitaliy Filippov 86b4682975 Put get_trim_pos into the "critical section". Fixes rare journal corruption issue
The consequence of this issue was that in some very rare cases (only reproduced
under load in CI when running 4+ tests in parallel) small write data written to
journal could overwrite journal entries.

Also add an assert-type safety check to be able to catch this issue in the
future again in case of a regression.
2023-06-17 00:06:42 +03:00
Vitaliy Filippov bdd48e4cf1 Release 0.9.1
- Fix "Client XX command out of sync" messages sometimes happening on OSD reconnections
- Fix a bug where EC reads parallel with writes to the same object failed with -ERANGE error
- Slightly reduce the amount of metadata writes during journal flushing
- Correctly unmap NBD volumes when Proxmox forces map_volume use (with SWTPM and maybe some other cases)
2023-06-10 11:42:49 +03:00
Vitaliy Filippov f9fbea25a4 Remove double write when old and new locations are in the same metadata block
Also add another metadata entry fool-safety check which, ideally, will never fire %)
2023-06-03 00:47:10 +03:00
Vitaliy Filippov 2c9a10d081 Fix an idiotic bug leading to failed reads with -ERANGE with EC :D 2023-06-03 00:44:52 +03:00