1
0
Fork 0
Commit Graph

559 Commits (a7b57386c05f2a3187cd24b9d5ef32d8389ddf05)

Author SHA1 Message Date
Vitaliy Filippov a7b57386c0 Do not print last subcommand result twice during "inverse" snapshot merge 2023-06-30 02:07:10 +03:00
Vitaliy Filippov 9d4ea5f764 Fix inverse parent selection which prevented the use of optimized merge in case of only 1 snapshot 2023-06-30 01:39:11 +03:00
Vitaliy Filippov 000e4944ec Remove "inverse parent" image name index key from etcd during snapshot merge 2023-06-30 01:23:30 +03:00
Vitaliy Filippov 8426616d89 Warn about unfinished deletions in rm-data 2023-06-30 01:18:25 +03:00
Vitaliy Filippov 1a841344ec Print progress of all operations during snapshot merge 2023-06-30 01:13:47 +03:00
Vitaliy Filippov 8603b5cb1d Do not hang on inactive OSDs during delete, report and skip them instead 2023-06-30 00:15:16 +03:00
Vitaliy Filippov 878ccbb6ea Fix snapshot chain "down-merge" ("up-merge" worked well...) 2023-06-29 00:47:21 +03:00
Vitaliy Filippov 63c2b9832c Fix chained (snapshot) reads often not working at all with chain size > 2 2023-06-28 18:54:03 +03:00
Vitaliy Filippov a11ca56fb1 Fix compatibility of the QEMU driver with iothread 2023-06-21 02:11:28 +03:00
Vitaliy Filippov b84927b340 Fix \n in nbd_proxy 2023-06-19 01:48:58 +03:00
Vitaliy Filippov 926be372fd Release 0.9.2
- Measure and report scrub I/O statistics in vitastor-cli status
- Make aggregated statistics in vitastor-cli status much smoother
  (first derive, then sum instead of first summing and then deriving)
- Fix an old rare bug leading to journal corruption
  (try to use scrub if you think you're affected...)
- Do not start EC PGs without at least <data chunks> OSDs in each old set
  (prevents spurious read errors with EC during reconnections/restarts)
- Fix failed assert(!scrub_list_op) on OSD restart with pending scrubs
- Fix future planned scrubs not starting because of incorrect time comparison
- Build packages for Debian 12 (Bookworm)
2023-06-18 19:44:33 +03:00
Vitaliy Filippov c74a424930 Report scrub I/O in vitastor-cli status 2023-06-17 21:11:21 +03:00
Vitaliy Filippov 32f2c4dd27 Measure scrub statistics 2023-06-17 20:56:26 +03:00
Vitaliy Filippov 3ad16b9a1a Fix auto_scrubs not starting because of < vs <= =)) 2023-06-17 17:32:21 +03:00
Vitaliy Filippov 1c2df841c2 Fix failed assert(!scrub_list_op) on OSD restart with pending scrubs 2023-06-17 17:02:54 +03:00
Vitaliy Filippov aa5dacc7a9 Do not start EC PGs without at least pg_data_size connections to old OSDs from each set 2023-06-17 02:16:30 +03:00
Vitaliy Filippov 4fdc49bdc7 Add another assert-type check (it does not fire, just as a safety measure for the future) 2023-06-17 00:07:22 +03:00
Vitaliy Filippov 86b4682975 Put get_trim_pos into the "critical section". Fixes rare journal corruption issue
The consequence of this issue was that in some very rare cases (only reproduced
under load in CI when running 4+ tests in parallel) small write data written to
journal could overwrite journal entries.

Also add an assert-type safety check to be able to catch this issue in the
future again in case of a regression.
2023-06-17 00:06:42 +03:00
Vitaliy Filippov bdd48e4cf1 Release 0.9.1
- Fix "Client XX command out of sync" messages sometimes happening on OSD reconnections
- Fix a bug where EC reads parallel with writes to the same object failed with -ERANGE error
- Slightly reduce the amount of metadata writes during journal flushing
- Correctly unmap NBD volumes when Proxmox forces map_volume use (with SWTPM and maybe some other cases)
2023-06-10 11:42:49 +03:00
Vitaliy Filippov f9fbea25a4 Remove double write when old and new locations are in the same metadata block
Also add another metadata entry fool-safety check which, ideally, will never fire %)
2023-06-03 00:47:10 +03:00
Vitaliy Filippov 2c9a10d081 Fix an idiotic bug leading to failed reads with -ERANGE with EC :D 2023-06-03 00:44:52 +03:00
Vitaliy Filippov 150968070f Slightly improve some debug prints 2023-05-29 01:04:16 +03:00
Vitaliy Filippov cdfc74665b Close client FDs only when destroying the client, after handling all async reads/writes
Fixes "Client XX command out of sync" sometimes happening on reconnections
2023-05-25 00:52:43 +03:00
Vitaliy Filippov 3b4cf29e65 Release 0.9.0
New features:
- Scrubbing! Check documentation: [auto_scrub](src/branch/master/docs/config/osd.en.md#auto_scrub)
- Document online-updatable configuration parameters

Bug fixes:
- Fix NaN during PG optimisation if there are nonexisting OSDs in node_placement
- Fix monitor crash on pool deletion
- Clear journal_device and meta_device before initialising the next OSD in automatic mode
- Sync unsynced deletes before overwriting them with a lower version
  (reproducted mostly/only after scrubbing)
2023-05-21 15:07:14 +03:00
Vitaliy Filippov ce02f47de6 Allow to disable scrub_find_best 2023-05-21 12:33:38 +03:00
Vitaliy Filippov f1961157f0 Fix brute-force error locator for EC n+k with k > 2 2023-05-21 00:57:14 +03:00
Vitaliy Filippov 88c1ba0790 Fix compile errors with gcc 10 2023-05-20 23:20:09 +03:00
Vitaliy Filippov fa90b5a4e7 Schedule automatic scrubs correctly (not just after previous scrub) 2023-05-20 23:20:09 +03:00
Vitaliy Filippov 8d40ad99a6 Add scrub documentation 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 6ca20aa194 Allow scrub to fix corrupted object states 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 4bfd994341 Sync unsynced deletes before overwriting them with a lower version 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 59e959dcbb Do not die when "different versions are returned from subops" 2023-05-20 23:19:39 +03:00
Vitaliy Filippov a9581f0739 Handle dirty deletes during read correctly O_o 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 105a405b0a Implement vitastor-cli fix 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0e5d0e02a9 Add "vitastor-cli describe" command 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0439981a66 Implement "describe object(s)" operation
Required to implement fixing inconsistent objects in vitastor-cli
2023-05-20 23:19:39 +03:00
Vitaliy Filippov 6648f6bb6e Implement ambiguity detection during scrub 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 281be547eb Implement brute-force error locator for EC 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0c78dd7178 Add no_scrub flag 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 3c924397e7 Store next scrub timestamp instead of last scrub timestamp 2023-05-20 23:19:39 +03:00
Vitaliy Filippov c3bd26193d Implement PG scrub runner 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 43b77d7619 Implement scrubbing "data path" - OSD_OP_SCRUB 2023-05-20 23:19:39 +03:00
Vitaliy Filippov a6d846863b Add min/max stripe and limit to OP_LIST 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 8dc427b43c Retry failed reads (including chained and RMW) from other replicas 2023-05-20 23:19:39 +03:00
Vitaliy Filippov bf2112653b Refcount object_states 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 0538a484b3 Add corrupted object state 2023-05-20 23:19:39 +03:00
Vitaliy Filippov 97720fa6b4 Remove unused capture 2023-05-20 22:58:51 +03:00
Vitaliy Filippov e60e352df6 Improve vitastor-nbd documentation 2023-05-20 22:58:51 +03:00
Vitaliy Filippov 629999f789 Clear journal_device and meta_device before initialising the next OSD in automatic mode 2023-05-15 23:58:55 +03:00
Vitaliy Filippov 5a9e1ede52 Release 0.8.9
- The tests are now stable and run in a CI system based on Gitea CI
- The release includes final bug fixes for EC:
  - Implement missing EC recovery of allocation bitmap when built with ISA-L
  - Fix broken snapshot export with EC (allocation bitmap reads were giving incorrect results previously)
- Also fixed bugs manifesting under heavy load:
  - Fix monitor possibly applying incorrect PG history on retries
  - Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number
  - Allow writes to wait for free space again, but now correctly (previously dropped in 0.8.2)
  - Fix a rare segfault in client (handle client stop during incoming stream handling in 1 more place)
  - Make monitor correctly handle etcd connection errors - it could die instead of connecting to another etcd
  - Fix OSD rarely being unable to report PG states after a PG was taken over by another OSD
- Fixed return code for incomplete EC objects (now EIO) and made cluster client retry this error
- Made other small changes for tests: timeouts, nice/ionice for etcd, waiting conditions, NBD device checks and so on
2023-05-14 01:25:09 +03:00