Commit Graph

511 Commits (72f0cff79df1a5b6653f2a03c6a8dcd73d8a7d64)

Author SHA1 Message Date
Vitaliy Filippov 629999f789 Clear journal_device and meta_device before initialising the next OSD in automatic mode 2023-05-15 23:58:55 +03:00
Vitaliy Filippov 5a9e1ede52 Release 0.8.9
Test / buildenv (push) Successful in 9s Details
Test / build (push) Successful in 2m31s Details
Test / test_cas (push) Successful in 12s Details
Test / make_test (push) Successful in 33s Details
Test / test_change_pg_size (push) Successful in 19s Details
Test / test_change_pg_count (push) Successful in 55s Details
Test / test_create_nomaxid (push) Successful in 21s Details
Test / test_change_pg_count_ec (push) Successful in 58s Details
Test / test_failure_domain (push) Successful in 13s Details
Test / test_etcd_fail (push) Successful in 1m4s Details
Test / test_interrupted_rebalance (push) Successful in 1m13s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m7s Details
Test / test_add_osd (push) Successful in 2m59s Details
Test / test_move_reappear (push) Successful in 24s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m22s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m1s Details
Test / test_rebalance_verify (push) Successful in 2m12s Details
Test / test_minsize_1 (push) Successful in 15s Details
Test / test_rebalance_verify_imm (push) Successful in 2m4s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m9s Details
Test / test_rm (push) Successful in 17s Details
Test / test_snapshot (push) Successful in 23s Details
Test / test_rebalance_verify_ec (push) Successful in 2m31s Details
Test / test_splitbrain (push) Successful in 23s Details
Test / test_snapshot_ec (push) Successful in 30s Details
Test / test_write_no_same (push) Successful in 16s Details
Test / test_write (push) Successful in 53s Details
Test / test_write_xor (push) Successful in 1m19s Details
Test / test_heal_pg_size_2 (push) Successful in 4m30s Details
Test / test_heal_ec (push) Successful in 4m32s Details
- The tests are now stable and run in a CI system based on Gitea CI
- The release includes final bug fixes for EC:
  - Implement missing EC recovery of allocation bitmap when built with ISA-L
  - Fix broken snapshot export with EC (allocation bitmap reads were giving incorrect results previously)
- Also fixed bugs manifesting under heavy load:
  - Fix monitor possibly applying incorrect PG history on retries
  - Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number
  - Allow writes to wait for free space again, but now correctly (previously dropped in 0.8.2)
  - Fix a rare segfault in client (handle client stop during incoming stream handling in 1 more place)
  - Make monitor correctly handle etcd connection errors - it could die instead of connecting to another etcd
  - Fix OSD rarely being unable to report PG states after a PG was taken over by another OSD
- Fixed return code for incomplete EC objects (now EIO) and made cluster client retry this error
- Made other small changes for tests: timeouts, nice/ionice for etcd, waiting conditions, NBD device checks and so on
2023-05-14 01:25:09 +03:00
Vitaliy Filippov de3e609166 Add a FIXME about QEMU driver thread safety 2023-05-14 00:06:09 +03:00
Vitaliy Filippov 11481170f5 Add a FIXME about ENOSPC 2023-05-13 23:59:44 +03:00
Vitaliy Filippov 6442010f93 Skip offline PGs during state reporting when the state is already deleted or taken over by another OSD
This fixes OSDs being unable to report PG states in rare conditions
2023-05-12 23:17:45 +03:00
Vitaliy Filippov ce4a8067b5 Handle client stop during incoming stream handling in 1 more place 2023-05-11 01:53:41 +03:00
Vitaliy Filippov 8cac795445 Return EIO instead of EINVAL for incomplete EC objects 2023-05-11 01:15:23 +03:00
Vitaliy Filippov a409598b16 Wait for free space again, but count on big_write flushes instead of just flusher activity 2023-05-10 01:51:02 +03:00
Vitaliy Filippov f4c6765522 Ignore ENOENT in epoll_ctl 2023-05-08 20:39:20 +03:00
Vitaliy Filippov 5da1d8e1b5 Fix EC just-bitmap reads (len=0) (fixes SCHEME=ec test_snapshot.sh) 2023-05-07 14:00:08 +03:00
Vitaliy Filippov 44f86f1999 Add a basic EC 2+2 recovery test (not really required, but let it be there) 2023-05-07 11:26:27 +03:00
Vitaliy Filippov 2d9a80c6f6 Implement missing bitmap recovery with ISA-L \(°□°)/ 2023-05-07 11:25:51 +03:00
Vitaliy Filippov ab615849d6 Release 0.8.8
- Fix vitastor-cli rm/rm-data broken in 0.8.6 (missing messenger initialization)
- Prepare OSD read handler for upcoming version with scrub - allow "secondary reads" to return errors
- Fix OSDs re-peering PGs infinitely with a big number of PGs (reproduced in test_add_osd)
- Fix another variant of flusher sync-waiting stall (reproduced in test_write)
- Fix other tests in tests/ (will add them to Gitea CI soon)
- Add patches for QEMU 6.2-8.0
- Fix QEMU driver compatibility with QEMU 8.0
- Build packages for RHEL 9 clones (based on AlmaLinux 9)
2023-04-28 11:22:00 +03:00
Vitaliy Filippov b94587ef0e Fix some build warnings 2023-04-28 00:44:27 +03:00
Vitaliy Filippov c768a9015f Fix QEMU driver compatibility with QEMU 8.0 2023-04-25 11:20:21 +03:00
Vitaliy Filippov b74ccb613c Fix another variant of flusher sync-waiting stall 2023-04-24 00:44:41 +03:00
Vitaliy Filippov a04dab0840 Initialize messenger in cluster_client listings 2023-04-24 00:44:41 +03:00
Vitaliy Filippov 160863f707 Print op pointer values in slow log 2023-04-23 17:54:00 +03:00
Vitaliy Filippov 2877cd0adb Allow OP_SEC_READ to return errors (do not hang the connection) 2023-04-23 17:54:00 +03:00
Vitaliy Filippov 480509f5b9 Fix pg_data_size > 1 for replicas (harmless bug) 2023-04-23 01:50:42 +03:00
Vitaliy Filippov 46462da45e Preload own PG history updates to fix PG state loop possibly applying the old metadata version 2023-04-23 01:50:30 +03:00
Vitaliy Filippov 7e958afeda Release 0.8.7
This release includes a bunch of important bugfixes for erasure-coded setups
with disabled immediate_commit. After these fixes, "test_heal" OSD killing test
now passes fine with EC:

- Fix cluster write stalls with "Error while doing flush on OSD xx: -16 (Device or resource busy)"
  in OSD logs possible in EC setups with disabled immediate_commit by selectively
  syncing nonsynced objects on STABILIZE/ROLLBACK (https://github.com/vitalif/vitastor/issues/51)
- Fix other EC + disabled immediate_commit problems:
  - Fix "opcode=5 retval=-2" errors happening on SYNC retries
  - Fix non-working "pagination" during PG dirty object flushing
  - Fix write operations not continued correctly after dirty object flushing
- Fix incorrect parity read-modify-write calculation when writing into a lost chunk
- Fix OSDs losing left_on_dead PG state of non-clean PGs and thus not removing junk data in the cluster
- Fix a small memory leak caused by bad indexing of EC recovery matrices
- Fix a rare use-after-free in cluster_client caused by a reenterability issue
- Fix vitastor-cli create command syntax in the CSI driver
- Allow to start OSDs without local store for tests
- Fix memory allocation error in disk_tool_meta for non-standard metadata block sizes
- Fix delete operations received before loading pool metadata crashing OSDs with "null pointer exception"
- Improve "theoretical performance" Russian documentation

New features:

- Implement online configuration update for some parameters. Documentation is coming soon :)
2023-04-11 02:11:57 +03:00
Vitaliy Filippov 2f5e769a29 Fix a small memory leak caused by bad indexing of EC recovery matrices 2023-04-11 00:30:36 +03:00
Vitaliy Filippov 3237014608 Fix incorrect parity read-modify-write calculation when writing into a lost chunk 2023-04-09 02:06:10 +03:00
Vitaliy Filippov baaf8f6f44 Fix write operations not continued correctly after flush 2023-04-09 02:06:10 +03:00
Vitaliy Filippov 1d83fdcd17 Add debug logs to osd_flush 2023-04-09 02:06:10 +03:00
Vitaliy Filippov 0ddd787c38 Fix non-working "pagination" during PG dirty object flushing 2023-04-08 02:44:02 +03:00
Vitaliy Filippov 6eff3a60a5 Do not lose left_on_dead PG state of non-clean PGs 2023-04-08 02:44:02 +03:00
Vitaliy Filippov 888a6975ab Fix a rare use-after-free in cluster_client caused by a reenterability issue 2023-04-08 02:44:02 +03:00
Vitaliy Filippov cd1e890bd4 Fix "opcode=5 retval=-2" errors sometimes possible with EC 2023-04-08 02:44:02 +03:00
Vitaliy Filippov 0fbf4c6a08 Selectively sync nonsynced objects on STABILIZE/ROLLBACK (fix for github issue #51) 2023-04-08 02:44:02 +03:00
Vitaliy Filippov d06ed2b0e7 Implement online config update 2023-03-26 19:21:50 +03:00
Vitaliy Filippov 2fb0c85618 Allow to start OSDs without local store (only for tests) 2023-03-15 01:13:59 +03:00
Vitaliy Filippov d81a6c04fc Update cmake min version so it does not complain about deprecation 2023-03-15 01:08:23 +03:00
Vitaliy Filippov 7b35801647 Fix possible bad realloc in disk_tool_meta for non-standard metadata block sizes 2023-03-15 01:08:23 +03:00
Vitaliy Filippov f3228d5c07 Fix typo (did not affect execution though) 2023-03-15 01:08:23 +03:00
Vitaliy Filippov 18366f5055 Fix read/write return type in rw_blocking 2023-03-15 01:08:14 +03:00
Vitaliy Filippov 851507c147 Add missing close() in test stubs 2023-03-15 00:23:56 +03:00
Vitaliy Filippov 9aaad28488 Fix "null pointer exception" for unhandled OSD_OP_DELETEs (when pool is not loaded yet) 2023-03-02 11:16:39 +03:00
Vitaliy Filippov 8810eae8fb Release 0.8.6
Important fixes:

- Fix possibly incorrect EC parity chunk updates with EC n+k, k > 1 and when
  the first parity chunk is missing

Minor fixes and improvements:

- Fix incorrect EC free space statistics in vitastor-cli df output
- Speedup vitastor-cli startup in clusters with RDMA
- Remove unused PG "peered" state (previously used to update PG epoch)
- Use sfdisk with just --json in vitastor-disk (--dump --json isn't needed)
- Allow trailing comma in sfdisk output (fixes sfdisk 2.36 compatibility)
- Slightly improve RDMA send/receive code
- Reduce RDMA memory consumption by default (rdma_max_recv/send = 16/8)
- Use vitastor-cli instead of direct etcd interaction in the CSI driver
2023-02-28 11:18:48 +03:00
Vitaliy Filippov 14d6acbcba Set default rdma_max_recv/send to 16/8, fix documentation 2023-02-28 11:00:56 +03:00
Vitaliy Filippov 1e307069bc Fix missing parity chunk calculation for EC n+k, k > 1 and first parity chunk missing 2023-02-28 02:40:19 +03:00
Vitaliy Filippov c3e80abad7 Allow to send more than 1 operation at a time 2023-02-26 02:01:04 +03:00
Vitaliy Filippov 138ffe4032 Reuse incoming RDMA buffers 2023-02-26 00:55:01 +03:00
Vitaliy Filippov 4ab630b44d Use just sfdisk --json, --dump is not needed 2023-02-23 00:55:47 +03:00
Vitaliy Filippov 2c8241b7db Remove PG "peered" state 2023-02-21 01:30:42 +03:00
Vitaliy Filippov 36a7dd3671 Move tests to "make test" 2023-02-21 01:30:42 +03:00
Vitaliy Filippov 936122bbcf Initialize msgr lazily in client to speedup vitastor-cli with RDMA enabled 2023-02-19 18:59:07 +03:00
Vitaliy Filippov 1a1ba0d1e7 Add set_immediate to ringloop and use it for bs/osd ops to prevent reenterability issues 2023-02-09 17:37:26 +03:00
Vitaliy Filippov 3d09c9cec7 Remove unused wait_sqe() from ringloop 2023-02-09 17:37:26 +03:00