1
0
Fork 0
Commit Graph

688 Commits (recovery-autotune)

Author SHA1 Message Date
Vitaliy Filippov 988e90be69 Fix vitastor-disk partition zeroing (it was writing random garbage instead of zeroes :D) 2023-07-28 12:29:07 +03:00
Vitaliy Filippov 700e0e9bff Handle parallel NFS extending writes without imposing extra load on etcd 2023-07-27 02:26:17 +00:00
Vitaliy Filippov ab0ca7c00f Return FILE_SYNC from NFS writes if immediate_commit is enabled 2023-07-26 02:09:47 +03:00
Vitaliy Filippov f153bc950b Return the same "verifier" in NFS COMMIT as in NFS WRITE
This fixes buffered (not O_DIRECT) NFS writes in Linux - previously they were
hanging in an infinite loop because COMMIT didn't return the same verifier as
previous WRITEs, and NFS kernel client was infinitely retrying the same writes.

Also this probably allows for correct NFS failover, at least for the same
buffered writes, because NFS clients repeat all write requests until a COMMIT
confirms them.
2023-07-26 02:09:47 +03:00
Vitaliy Filippov 425ff8818d Add . and .. in NFS directory listings
MC, for example, hangs with infinite listing retries without them
2023-07-26 02:09:47 +03:00
Vitaliy Filippov 9e287a7778 Handle extending writes correctly in NFS proxy
Previously, multiple parallel writes extending file size through NFS were
racing with each other and triggering deletions of part of the written data

I.e. if you mounted vitastor-nfs and just copied a file into it in MC then
you could end up with only a part of the file actually written
2023-07-26 02:09:43 +03:00
Vitaliy Filippov f52f58b9e9 Support UTF-8 in vitastor-cli table output 2023-07-25 01:48:57 +00:00
Vitaliy Filippov 1fe6b0c0e2 Also allow "0" and "no" as false for inmemory_metadata and inmemory_journal 2023-07-25 01:48:57 +00:00
Vitaliy Filippov e4237e9ed8 Enable HDD defaults for HDD-only in automatic `vitastor-disk prepare` mode 2023-07-23 02:33:22 +03:00
Vitaliy Filippov 10a5fd6abb Release 0.9.5
A hotfix to 0.9.4 containing only one bugfix: 100% CPU usage in the new QEMU
driver caused by the lack of eventfd reset on io_uring event handling :)
2023-07-21 00:04:41 +03:00
Vitaliy Filippov 1c316ef350 Reset eventfd on every ringloop::loop() 2023-07-21 00:04:41 +03:00
Vitaliy Filippov 0b2d12eef1 Remove has_work, it was unnecessary 2023-07-21 00:04:37 +03:00
Vitaliy Filippov 1c10430ae1 Release 0.9.4
- Improve QEMU driver performance by integrating io_uring in it (up to 1.5x total iops improvement)
- Fix QEMU driver deadlocks which started to reproduce in qemu-img after iothread fixes
- Fix `vitastor-cli status` reporting more etcds than actually exists (fix etcd address duplication in config on reload)
- Fix `vitastor-cli ls` crashing on inodes in non-existing pools
- Delete old garbage /pool/stats/ keys for non-existing (deleted) pools
- Reduce memory usage of etcds initialized by make-etcd script
- Fix OSDs almost always crashing on etcd restart due to "revisions were compacted" (support reloading state from etcd)
- Fix a crash and a stall possible mostly in HDD setups with small journal and big (512k, 900k) random writes
- Add notes about HDDs to documentation. You are officially allowed to use HDD-only Vitastor with HGST/Toshiba/EXOS :)
2023-07-19 02:50:30 +03:00
Vitaliy Filippov d0e257ee81 Fix non-existing pool handling in `vitastor-cli ls` 2023-07-18 23:52:02 +03:00
Vitaliy Filippov 9815d70ffc It is impossible to use io_uring with older vitastor-client because it does not have vitastor_c_uring_has_work() 2023-07-18 23:37:53 +03:00
Vitaliy Filippov 4a4627dcab Do not use bool in C library 2023-07-18 23:37:53 +03:00
Vitaliy Filippov ba7427020e Fix deadlocks possible in qemu-img after fixing iothread
Deadlock was caused by switching QEMU coroutines directly inside
vitastor_co_read_bitmap_cb() callback. The correct way is to schedule a BH
/BH is a QEMU term for setImmediate() :)/, same as in read and write callbacks.
2023-07-18 23:32:16 +03:00
Vitaliy Filippov ac7b834af3 Disable journal_no_same_sector_overwrites by default for HDD-only 2023-07-10 00:34:35 +03:00
Vitaliy Filippov 57ad4c3636 Add a note about HDD, enable throttling only for hybrid OSDs 2023-07-09 12:45:11 +03:00
Vitaliy Filippov b7e4d0c9bf Fix journal dirty_start position tracking and some debug prints
Fixes two bugs found during HDD testing :-)
1) OSD crashed with "BUG: Attempt to overwrite used offset of the journal" during
   `fio -bs=900k -iodepth=128` test with 16 MB journal
2) OSD stalled during `fio -bs=512k -iodepth=128` test with 64 MB journal
2023-07-09 01:17:55 +03:00
Vitaliy Filippov 161a23c966 Support reloading state when etcd says "revisions were compacted"
Before this change, OSDs almost always died when one of the etcds was restarted,
even though the rest of them was still in quorum and the lease was still active
2023-07-07 01:33:48 +03:00
Vitaliy Filippov 45c0694853 Clear etcd_local addresses on reload and also skip duplicates 2023-07-06 00:39:39 +03:00
Vitaliy Filippov 30ac899074 Make QEMU driver compatible with older vitastor_client and with systems without io_uring 2023-07-04 15:51:43 +03:00
Vitaliy Filippov 2348d39cf4 Avoid repeated qemu_uring_handlers, add 2.0-2.7 compatibility 2023-07-04 00:28:23 +03:00
Vitaliy Filippov 3de7929fe5 Integrate v2 - direct epoll 2023-07-04 00:28:23 +03:00
Vitaliy Filippov 07b2196bc2 Integrate QEMU driver with io_uring 2023-07-04 00:28:23 +03:00
Vitaliy Filippov a612cdca47 Release 0.9.3
- Add patch for libvirt 9.0
- Add support for Proxmox VE 8.0
- Fix compatibility of the QEMU driver with iothread (QEMU rebuilds are coming)
- Fix vitastor-cli rm-data/rm/merge hanging when some OSDs are down.
  Allow deletions in unclean cluster at the cost of some data possibly
  "reappearing" when those OSDs start back. In that case you can just repeat
  the deletion request using rm-data.
- A bunch of bug fixes for snapshots:
  - Fix snapshot reads often not working at all with snapshot chain size > 2
  - Fix optimized snapshot data merge (children to parent)
  - Fix updating of image name index key during optimized merge
  - Fix auto-selection preventing the use of optimized merge with only 1 snapshot
  - Fix incorrect CAS retries during snapshot merge
  - Fix snapshot merge progress reporting
- Fix primary_read bitmap buffers use-after-free which could lead to
  incorrect allocation map reads
- Remove /usr/local/bin path from make-etcd
- Some documentation fixes
2023-07-01 00:25:58 +03:00
Vitaliy Filippov c8d61568b5 Fix primary_read bitmap buffers being freed too early (use-after-free) 2023-06-30 12:47:45 +03:00
Vitaliy Filippov 84ed3c6395 Fix CAS retries during snapshot merge 2023-06-30 02:30:23 +03:00
Vitaliy Filippov a7b57386c0 Do not print last subcommand result twice during "inverse" snapshot merge 2023-06-30 02:07:10 +03:00
Vitaliy Filippov 9d4ea5f764 Fix inverse parent selection which prevented the use of optimized merge in case of only 1 snapshot 2023-06-30 01:39:11 +03:00
Vitaliy Filippov 000e4944ec Remove "inverse parent" image name index key from etcd during snapshot merge 2023-06-30 01:23:30 +03:00
Vitaliy Filippov 8426616d89 Warn about unfinished deletions in rm-data 2023-06-30 01:18:25 +03:00
Vitaliy Filippov 1a841344ec Print progress of all operations during snapshot merge 2023-06-30 01:13:47 +03:00
Vitaliy Filippov 8603b5cb1d Do not hang on inactive OSDs during delete, report and skip them instead 2023-06-30 00:15:16 +03:00
Vitaliy Filippov 878ccbb6ea Fix snapshot chain "down-merge" ("up-merge" worked well...) 2023-06-29 00:47:21 +03:00
Vitaliy Filippov 63c2b9832c Fix chained (snapshot) reads often not working at all with chain size > 2 2023-06-28 18:54:03 +03:00
Vitaliy Filippov a11ca56fb1 Fix compatibility of the QEMU driver with iothread 2023-06-21 02:11:28 +03:00
Vitaliy Filippov b84927b340 Fix \n in nbd_proxy 2023-06-19 01:48:58 +03:00
Vitaliy Filippov 926be372fd Release 0.9.2
- Measure and report scrub I/O statistics in vitastor-cli status
- Make aggregated statistics in vitastor-cli status much smoother
  (first derive, then sum instead of first summing and then deriving)
- Fix an old rare bug leading to journal corruption
  (try to use scrub if you think you're affected...)
- Do not start EC PGs without at least <data chunks> OSDs in each old set
  (prevents spurious read errors with EC during reconnections/restarts)
- Fix failed assert(!scrub_list_op) on OSD restart with pending scrubs
- Fix future planned scrubs not starting because of incorrect time comparison
- Build packages for Debian 12 (Bookworm)
2023-06-18 19:44:33 +03:00
Vitaliy Filippov c74a424930 Report scrub I/O in vitastor-cli status 2023-06-17 21:11:21 +03:00
Vitaliy Filippov 32f2c4dd27 Measure scrub statistics 2023-06-17 20:56:26 +03:00
Vitaliy Filippov 3ad16b9a1a Fix auto_scrubs not starting because of < vs <= =)) 2023-06-17 17:32:21 +03:00
Vitaliy Filippov 1c2df841c2 Fix failed assert(!scrub_list_op) on OSD restart with pending scrubs 2023-06-17 17:02:54 +03:00
Vitaliy Filippov aa5dacc7a9 Do not start EC PGs without at least pg_data_size connections to old OSDs from each set 2023-06-17 02:16:30 +03:00
Vitaliy Filippov 4fdc49bdc7 Add another assert-type check (it does not fire, just as a safety measure for the future) 2023-06-17 00:07:22 +03:00
Vitaliy Filippov 86b4682975 Put get_trim_pos into the "critical section". Fixes rare journal corruption issue
The consequence of this issue was that in some very rare cases (only reproduced
under load in CI when running 4+ tests in parallel) small write data written to
journal could overwrite journal entries.

Also add an assert-type safety check to be able to catch this issue in the
future again in case of a regression.
2023-06-17 00:06:42 +03:00
Vitaliy Filippov bdd48e4cf1 Release 0.9.1
- Fix "Client XX command out of sync" messages sometimes happening on OSD reconnections
- Fix a bug where EC reads parallel with writes to the same object failed with -ERANGE error
- Slightly reduce the amount of metadata writes during journal flushing
- Correctly unmap NBD volumes when Proxmox forces map_volume use (with SWTPM and maybe some other cases)
2023-06-10 11:42:49 +03:00
Vitaliy Filippov f9fbea25a4 Remove double write when old and new locations are in the same metadata block
Also add another metadata entry fool-safety check which, ideally, will never fire %)
2023-06-03 00:47:10 +03:00
Vitaliy Filippov 2c9a10d081 Fix an idiotic bug leading to failed reads with -ERANGE with EC :D 2023-06-03 00:44:52 +03:00
Vitaliy Filippov 150968070f Slightly improve some debug prints 2023-05-29 01:04:16 +03:00
Vitaliy Filippov cdfc74665b Close client FDs only when destroying the client, after handling all async reads/writes
Fixes "Client XX command out of sync" sometimes happening on reconnections
2023-05-25 00:52:43 +03:00
Vitaliy Filippov 3b4cf29e65 Release 0.9.0
New features:
- Scrubbing! Check documentation: [auto_scrub](src/branch/master/docs/config/osd.en.md#auto_scrub)
- Document online-updatable configuration parameters

Bug fixes:
- Fix NaN during PG optimisation if there are nonexisting OSDs in node_placement
- Fix monitor crash on pool deletion
- Clear journal_device and meta_device before initialising the next OSD in automatic mode
- Sync unsynced deletes before overwriting them with a lower version
  (reproducted mostly/only after scrubbing)
2023-05-21 15:07:14 +03:00
Vitaliy Filippov ce02f47de6 Allow to disable scrub_find_best 2023-05-21 12:33:38 +03:00
Vitaliy Filippov f1961157f0 Fix brute-force error locator for EC n+k with k > 2 2023-05-21 00:57:14 +03:00
Vitaliy Filippov 88c1ba0790 Fix compile errors with gcc 10 2023-05-20 23:20:09 +03:00
Vitaliy Filippov fa90b5a4e7 Schedule automatic scrubs correctly (not just after previous scrub) 2023-05-20 23:20:09 +03:00
Vitaliy Filippov 8d40ad99a6 Add scrub documentation 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 6ca20aa194 Allow scrub to fix corrupted object states 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 4bfd994341 Sync unsynced deletes before overwriting them with a lower version 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 59e959dcbb Do not die when "different versions are returned from subops" 2023-05-20 23:19:39 +03:00
Vitaliy Filippov a9581f0739 Handle dirty deletes during read correctly O_o 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 105a405b0a Implement vitastor-cli fix 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0e5d0e02a9 Add "vitastor-cli describe" command 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0439981a66 Implement "describe object(s)" operation
Required to implement fixing inconsistent objects in vitastor-cli
2023-05-20 23:19:39 +03:00
Vitaliy Filippov 6648f6bb6e Implement ambiguity detection during scrub 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 281be547eb Implement brute-force error locator for EC 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0c78dd7178 Add no_scrub flag 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 3c924397e7 Store next scrub timestamp instead of last scrub timestamp 2023-05-20 23:19:39 +03:00
Vitaliy Filippov c3bd26193d Implement PG scrub runner 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 43b77d7619 Implement scrubbing "data path" - OSD_OP_SCRUB 2023-05-20 23:19:39 +03:00
Vitaliy Filippov a6d846863b Add min/max stripe and limit to OP_LIST 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 8dc427b43c Retry failed reads (including chained and RMW) from other replicas 2023-05-20 23:19:39 +03:00
Vitaliy Filippov bf2112653b Refcount object_states 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0538a484b3 Add corrupted object state 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 97720fa6b4 Remove unused capture 2023-05-20 22:58:51 +03:00
Vitaliy Filippov e60e352df6 Improve vitastor-nbd documentation 2023-05-20 22:58:51 +03:00
Vitaliy Filippov 629999f789 Clear journal_device and meta_device before initialising the next OSD in automatic mode 2023-05-15 23:58:55 +03:00
Vitaliy Filippov 5a9e1ede52 Release 0.8.9
- The tests are now stable and run in a CI system based on Gitea CI
- The release includes final bug fixes for EC:
  - Implement missing EC recovery of allocation bitmap when built with ISA-L
  - Fix broken snapshot export with EC (allocation bitmap reads were giving incorrect results previously)
- Also fixed bugs manifesting under heavy load:
  - Fix monitor possibly applying incorrect PG history on retries
  - Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number
  - Allow writes to wait for free space again, but now correctly (previously dropped in 0.8.2)
  - Fix a rare segfault in client (handle client stop during incoming stream handling in 1 more place)
  - Make monitor correctly handle etcd connection errors - it could die instead of connecting to another etcd
  - Fix OSD rarely being unable to report PG states after a PG was taken over by another OSD
- Fixed return code for incomplete EC objects (now EIO) and made cluster client retry this error
- Made other small changes for tests: timeouts, nice/ionice for etcd, waiting conditions, NBD device checks and so on
2023-05-14 01:25:09 +03:00
Vitaliy Filippov de3e609166 Add a FIXME about QEMU driver thread safety 2023-05-14 00:06:09 +03:00
Vitaliy Filippov 11481170f5 Add a FIXME about ENOSPC 2023-05-13 23:59:44 +03:00
Vitaliy Filippov 6442010f93 Skip offline PGs during state reporting when the state is already deleted or taken over by another OSD
This fixes OSDs being unable to report PG states in rare conditions
2023-05-12 23:17:45 +03:00
Vitaliy Filippov ce4a8067b5 Handle client stop during incoming stream handling in 1 more place 2023-05-11 01:53:41 +03:00
Vitaliy Filippov 8cac795445 Return EIO instead of EINVAL for incomplete EC objects 2023-05-11 01:15:23 +03:00
Vitaliy Filippov a409598b16 Wait for free space again, but count on big_write flushes instead of just flusher activity 2023-05-10 01:51:02 +03:00
Vitaliy Filippov f4c6765522 Ignore ENOENT in epoll_ctl 2023-05-08 20:39:20 +03:00
Vitaliy Filippov 5da1d8e1b5 Fix EC just-bitmap reads (len=0) (fixes SCHEME=ec test_snapshot.sh) 2023-05-07 14:00:08 +03:00
Vitaliy Filippov 44f86f1999 Add a basic EC 2+2 recovery test (not really required, but let it be there) 2023-05-07 11:26:27 +03:00
Vitaliy Filippov 2d9a80c6f6 Implement missing bitmap recovery with ISA-L \(°□°)/ 2023-05-07 11:25:51 +03:00
Vitaliy Filippov ab615849d6 Release 0.8.8
- Fix vitastor-cli rm/rm-data broken in 0.8.6 (missing messenger initialization)
- Prepare OSD read handler for upcoming version with scrub - allow "secondary reads" to return errors
- Fix OSDs re-peering PGs infinitely with a big number of PGs (reproduced in test_add_osd)
- Fix another variant of flusher sync-waiting stall (reproduced in test_write)
- Fix other tests in tests/ (will add them to Gitea CI soon)
- Add patches for QEMU 6.2-8.0
- Fix QEMU driver compatibility with QEMU 8.0
- Build packages for RHEL 9 clones (based on AlmaLinux 9)
2023-04-28 11:22:00 +03:00
Vitaliy Filippov b94587ef0e Fix some build warnings 2023-04-28 00:44:27 +03:00
Vitaliy Filippov c768a9015f Fix QEMU driver compatibility with QEMU 8.0 2023-04-25 11:20:21 +03:00
Vitaliy Filippov b74ccb613c Fix another variant of flusher sync-waiting stall 2023-04-24 00:44:41 +03:00
Vitaliy Filippov a04dab0840 Initialize messenger in cluster_client listings 2023-04-24 00:44:41 +03:00
Vitaliy Filippov 160863f707 Print op pointer values in slow log 2023-04-23 17:54:00 +03:00
Vitaliy Filippov 2877cd0adb Allow OP_SEC_READ to return errors (do not hang the connection) 2023-04-23 17:54:00 +03:00
Vitaliy Filippov 480509f5b9 Fix pg_data_size > 1 for replicas (harmless bug) 2023-04-23 01:50:42 +03:00
Vitaliy Filippov 46462da45e Preload own PG history updates to fix PG state loop possibly applying the old metadata version 2023-04-23 01:50:30 +03:00
Vitaliy Filippov 7e958afeda Release 0.8.7
This release includes a bunch of important bugfixes for erasure-coded setups
with disabled immediate_commit. After these fixes, "test_heal" OSD killing test
now passes fine with EC:

- Fix cluster write stalls with "Error while doing flush on OSD xx: -16 (Device or resource busy)"
  in OSD logs possible in EC setups with disabled immediate_commit by selectively
  syncing nonsynced objects on STABILIZE/ROLLBACK (https://github.com/vitalif/vitastor/issues/51)
- Fix other EC + disabled immediate_commit problems:
  - Fix "opcode=5 retval=-2" errors happening on SYNC retries
  - Fix non-working "pagination" during PG dirty object flushing
  - Fix write operations not continued correctly after dirty object flushing
- Fix incorrect parity read-modify-write calculation when writing into a lost chunk
- Fix OSDs losing left_on_dead PG state of non-clean PGs and thus not removing junk data in the cluster
- Fix a small memory leak caused by bad indexing of EC recovery matrices
- Fix a rare use-after-free in cluster_client caused by a reenterability issue
- Fix vitastor-cli create command syntax in the CSI driver
- Allow to start OSDs without local store for tests
- Fix memory allocation error in disk_tool_meta for non-standard metadata block sizes
- Fix delete operations received before loading pool metadata crashing OSDs with "null pointer exception"
- Improve "theoretical performance" Russian documentation

New features:

- Implement online configuration update for some parameters. Documentation is coming soon :)
2023-04-11 02:11:57 +03:00
Vitaliy Filippov 2f5e769a29 Fix a small memory leak caused by bad indexing of EC recovery matrices 2023-04-11 00:30:36 +03:00
Vitaliy Filippov 3237014608 Fix incorrect parity read-modify-write calculation when writing into a lost chunk 2023-04-09 02:06:10 +03:00
Vitaliy Filippov baaf8f6f44 Fix write operations not continued correctly after flush 2023-04-09 02:06:10 +03:00
Vitaliy Filippov 1d83fdcd17 Add debug logs to osd_flush 2023-04-09 02:06:10 +03:00
Vitaliy Filippov 0ddd787c38 Fix non-working "pagination" during PG dirty object flushing 2023-04-08 02:44:02 +03:00
Vitaliy Filippov 6eff3a60a5 Do not lose left_on_dead PG state of non-clean PGs 2023-04-08 02:44:02 +03:00
Vitaliy Filippov 888a6975ab Fix a rare use-after-free in cluster_client caused by a reenterability issue 2023-04-08 02:44:02 +03:00
Vitaliy Filippov cd1e890bd4 Fix "opcode=5 retval=-2" errors sometimes possible with EC 2023-04-08 02:44:02 +03:00
Vitaliy Filippov 0fbf4c6a08 Selectively sync nonsynced objects on STABILIZE/ROLLBACK (fix for github issue #51) 2023-04-08 02:44:02 +03:00
Vitaliy Filippov d06ed2b0e7 Implement online config update 2023-03-26 19:21:50 +03:00
Vitaliy Filippov 2fb0c85618 Allow to start OSDs without local store (only for tests) 2023-03-15 01:13:59 +03:00
Vitaliy Filippov d81a6c04fc Update cmake min version so it does not complain about deprecation 2023-03-15 01:08:23 +03:00
Vitaliy Filippov 7b35801647 Fix possible bad realloc in disk_tool_meta for non-standard metadata block sizes 2023-03-15 01:08:23 +03:00
Vitaliy Filippov f3228d5c07 Fix typo (did not affect execution though) 2023-03-15 01:08:23 +03:00
Vitaliy Filippov 18366f5055 Fix read/write return type in rw_blocking 2023-03-15 01:08:14 +03:00
Vitaliy Filippov 851507c147 Add missing close() in test stubs 2023-03-15 00:23:56 +03:00
Vitaliy Filippov 9aaad28488 Fix "null pointer exception" for unhandled OSD_OP_DELETEs (when pool is not loaded yet) 2023-03-02 11:16:39 +03:00
Vitaliy Filippov 8810eae8fb Release 0.8.6
Important fixes:

- Fix possibly incorrect EC parity chunk updates with EC n+k, k > 1 and when
  the first parity chunk is missing

Minor fixes and improvements:

- Fix incorrect EC free space statistics in vitastor-cli df output
- Speedup vitastor-cli startup in clusters with RDMA
- Remove unused PG "peered" state (previously used to update PG epoch)
- Use sfdisk with just --json in vitastor-disk (--dump --json isn't needed)
- Allow trailing comma in sfdisk output (fixes sfdisk 2.36 compatibility)
- Slightly improve RDMA send/receive code
- Reduce RDMA memory consumption by default (rdma_max_recv/send = 16/8)
- Use vitastor-cli instead of direct etcd interaction in the CSI driver
2023-02-28 11:18:48 +03:00
Vitaliy Filippov 14d6acbcba Set default rdma_max_recv/send to 16/8, fix documentation 2023-02-28 11:00:56 +03:00
Vitaliy Filippov 1e307069bc Fix missing parity chunk calculation for EC n+k, k > 1 and first parity chunk missing 2023-02-28 02:40:19 +03:00
Vitaliy Filippov c3e80abad7 Allow to send more than 1 operation at a time 2023-02-26 02:01:04 +03:00
Vitaliy Filippov 138ffe4032 Reuse incoming RDMA buffers 2023-02-26 00:55:01 +03:00
Vitaliy Filippov 4ab630b44d Use just sfdisk --json, --dump is not needed 2023-02-23 00:55:47 +03:00
Vitaliy Filippov 2c8241b7db Remove PG "peered" state 2023-02-21 01:30:42 +03:00
Vitaliy Filippov 36a7dd3671 Move tests to "make test" 2023-02-21 01:30:42 +03:00
Vitaliy Filippov 936122bbcf Initialize msgr lazily in client to speedup vitastor-cli with RDMA enabled 2023-02-19 18:59:07 +03:00
Vitaliy Filippov 1a1ba0d1e7 Add set_immediate to ringloop and use it for bs/osd ops to prevent reenterability issues 2023-02-09 17:37:26 +03:00
Vitaliy Filippov 3d09c9cec7 Remove unused wait_sqe() from ringloop 2023-02-09 17:37:26 +03:00
Vitaliy Filippov 3d08a1ad6c Fix cluster_client test after last reenterability fixes 2023-02-05 01:47:32 +03:00
Vitaliy Filippov aba93b951b Fix incorrect EC free space statistics in vitastor-cli df output 2023-01-26 02:04:29 +03:00
Vitaliy Filippov d125fb1f30 Release 0.8.5
- Fix a possible "double free" bug in the client library happening on OSD restart
- Fix a possible write hang on PG history update when only epoch is changed
- Fix incorrect systemd target "local.target" in mon/make-etcd
- Allow "content" option in PVE storage plugin to allow to enable containers
- Build client library without tcmalloc which fixes "attempt to free invalid pointer"
  errors when, for example, trying to run QEMU with both Vitastor and Ceph RBD disks
2023-01-25 01:43:49 +03:00
Vitaliy Filippov 8b552a01f9 Do not retry successful operation parts in client (could lead to "double free" bugs) 2023-01-25 01:30:36 +03:00
Vitaliy Filippov 0385b2f9e8 Fix write hangs on PG epoch update - always set pg.history_changed to true 2023-01-25 01:30:15 +03:00
Vitaliy Filippov 9f4e34a8cc Build client library without tcmalloc
Fixes "[src/tcmalloc.cc:332] Attempt to free invalid pointer ..." when trying
to run QEMU with both Vitastor and Ceph RBD disks and other possible allocator
collisions.
2023-01-15 00:01:11 +03:00
Vitaliy Filippov 81fc8bb94c Release 0.8.4
New features:
- Implement QCOW2 image/snapshot export via qemu-img (bdrv_co_block_status in the driver)
- Remove OSDs from PG history during `vitastor-cli rm-osd` to prevent `left_on_dead` PG states after deletion
- Add a new recovery_pg_switch setting to mix all PGs during recovery, to almost
  fully reduce the probability of ENOSPC during rebalance
- Introduce partial ENOSPC ("OSD is full") handling - now ENOSPC doesn't turn
  into cascades of crashes
- Add migration support to Proxmox VE Vitastor driver
- Track last_clean_pgs on a per-pool basis thus reducing data movement in a cluster
  with pools remaining unclean/degraded for a long time

Bug fixes:
- Fix a bug where monitor could generate degraded PGs if one of the hosts had no OSDs
- Fix a bug where monitor could skip PG redistribution with a lot of OSDs in cluster
- Report PG history synchronously on the first write, which improves PG consistency
  and availability at the same time, because history now gets reported correctly
  and doesn't get reported without the need for it
- Fix possible write and recovery stalls which could happen in a cluster with both EC and replicated pools
- Make OSD and monitors sanitize & deduplicate PG history items in etcd
- Fix non-working OSD peer config safety check
- Fix a rare journal flush stall where flushing wasn't activated with full journal, but with empty flush queue
- Fix builds without ISA-L (jerasure-only) crashing with EC N+K, K>=2 due to the lack of 16-byte buffer alignment
- Fix a possible crash for EC N+K, K>=2 when calculating a parity chunk with previous parity chunk missing
- Fix a bug where vitastor-disk purge with suppressed warnings didn't work
2023-01-13 23:59:54 +03:00
Vitaliy Filippov bc465c16de Fix arithmetic on void* for clang 2023-01-13 23:58:42 +03:00
Vitaliy Filippov 8763e9211c Fix qemu driver compilation warning/error 2023-01-13 23:44:39 +03:00
Vitaliy Filippov fe87b4076b Fix backwards compatibility in cluster_client 2023-01-12 02:37:31 +03:00
Vitaliy Filippov 137309cf29 Implement bdrv_co_block_status for snapshot export support 2023-01-07 17:06:58 +03:00
Vitaliy Filippov 373f9d0387 Try to re-peer PGs on history change 2023-01-06 12:46:44 +03:00
Vitaliy Filippov c4516ea971 Also remove deleted OSD from PG configuration and last_clean_pgs 2023-01-06 12:46:44 +03:00
Vitaliy Filippov 91065c80fc Try to prevent left_on_dead when deleting OSDs by removing them from PG history 2023-01-06 12:46:43 +03:00
Vitaliy Filippov 02e7be7dc9 Prevent reenterability side effects during PG history operation resume 2023-01-03 02:20:50 +03:00
Vitaliy Filippov 73940adf07 Prioritize EC (non-instantly-stable) operations under journal pressure
This reduces the probability of hitting OSD stalls with EC due to "deadlocks"
where two parallel write operations wait for each other to complete
2023-01-03 00:05:45 +03:00
Vitaliy Filippov e950c024d3 Do not sync peer OSDs before listing
Sync before listing was added to wait for all PG writes possibly left in queue
from the previous master to finish before listing it

But in fact it may block the cluster when EC is used and some unstable writes
are left in the queue - they block journal flushing, rollback/stabilize is
required to unblock them, but rollback/stabilize may only happen after PG is
peered. But peering needs listings, listings are requested only after sync, and
sync itself waits for currently blocked writes waiting in the queue
2023-01-03 00:05:45 +03:00
Vitaliy Filippov 71d6d9f868 Fix possible crash on ENOSPC during operation cancel in blockstore 2023-01-03 00:05:45 +03:00
Vitaliy Filippov a4dfa519af Report PG history synchronously during write
This has 2 effects:
1) OSD sets aren't added into PG history until actual write attempts anymore
   which removes unneeded extra osd_sets in PG history
2) New OSD sets are reported synchronously and can't be lost on PG restarts
   happening at the same time with reconfiguration
2023-01-01 23:41:05 +03:00
Vitaliy Filippov 67019f5b02 Make OSD sort & sanitize PG history items 2023-01-01 23:17:42 +03:00
Vitaliy Filippov 0593e5c21c Fix OSD peer config safety check 2022-12-31 02:24:42 +03:00
Vitaliy Filippov 998e24adf8 Add a new recovery_pg_switch setting to mix all PGs during recovery 2022-12-30 02:03:33 +03:00
Vitaliy Filippov d7bd36dc32 Fix another rare journal flush stall 2022-12-30 02:03:33 +03:00