Commit Graph

451 Commits (lrc-matrix)

Author SHA1 Message Date
Vitaliy Filippov 4bee7f7b78 WIP/experimental LRC matrix generator 2023-01-06 17:13:29 +03:00
Vitaliy Filippov 373f9d0387 Try to re-peer PGs on history change 2023-01-06 12:46:44 +03:00
Vitaliy Filippov c4516ea971 Also remove deleted OSD from PG configuration and last_clean_pgs 2023-01-06 12:46:44 +03:00
Vitaliy Filippov 91065c80fc Try to prevent left_on_dead when deleting OSDs by removing them from PG history 2023-01-06 12:46:43 +03:00
Vitaliy Filippov 02e7be7dc9 Prevent reenterability side effects during PG history operation resume 2023-01-03 02:20:50 +03:00
Vitaliy Filippov 73940adf07 Prioritize EC (non-instantly-stable) operations under journal pressure
This reduces the probability of hitting OSD stalls with EC due to "deadlocks"
where two parallel write operations wait for each other to complete
2023-01-03 00:05:45 +03:00
Vitaliy Filippov e950c024d3 Do not sync peer OSDs before listing
Sync before listing was added to wait for all PG writes possibly left in queue
from the previous master to finish before listing it

But in fact it may block the cluster when EC is used and some unstable writes
are left in the queue - they block journal flushing, rollback/stabilize is
required to unblock them, but rollback/stabilize may only happen after PG is
peered. But peering needs listings, listings are requested only after sync, and
sync itself waits for currently blocked writes waiting in the queue
2023-01-03 00:05:45 +03:00
Vitaliy Filippov 71d6d9f868 Fix possible crash on ENOSPC during operation cancel in blockstore 2023-01-03 00:05:45 +03:00
Vitaliy Filippov a4dfa519af Report PG history synchronously during write
This has 2 effects:
1) OSD sets aren't added into PG history until actual write attempts anymore
   which removes unneeded extra osd_sets in PG history
2) New OSD sets are reported synchronously and can't be lost on PG restarts
   happening at the same time with reconfiguration
2023-01-01 23:41:05 +03:00
Vitaliy Filippov 67019f5b02 Make OSD sort & sanitize PG history items 2023-01-01 23:17:42 +03:00
Vitaliy Filippov 0593e5c21c Fix OSD peer config safety check 2022-12-31 02:24:42 +03:00
Vitaliy Filippov 998e24adf8 Add a new recovery_pg_switch setting to mix all PGs during recovery 2022-12-30 02:03:33 +03:00
Vitaliy Filippov d7bd36dc32 Fix another rare journal flush stall 2022-12-30 02:03:33 +03:00
Vitaliy Filippov cf5c562800 Log all object locations when peering PGs 2022-12-30 02:03:33 +03:00
Vitaliy Filippov 629200b0cc Return ENOSPC as the primary OSD 2022-12-30 02:03:33 +03:00
Vitaliy Filippov 3589ccec22 Do not disconnect peer on ENOSPC during write 2022-12-30 01:54:25 +03:00
Vitaliy Filippov 8d55a1e780 Build osd_rmw_test both with and without ISA-L 2022-12-29 19:13:57 +03:00
Vitaliy Filippov 65f6b3a4eb Fix jerasure crashing on bitmap calculation/restoration due to the lack of 16-byte alignment 2022-12-29 19:13:57 +03:00
Vitaliy Filippov fd216eac77 Add a test for missing parity chunk calculation 2022-12-29 19:13:57 +03:00
Vitaliy Filippov 61fca7c426 Fix crash when calculating a parity chunk with previous parity chunk missing (test coming shortly) 2022-12-29 19:13:57 +03:00
Vitaliy Filippov 68f3fb795e Suppress warnings in vitastor-disk purge correctly 2022-12-27 11:09:19 +03:00
Vitaliy Filippov fa90f287da Release 0.8.3
- Implement a new "vitastor-disk purge" command to remove OSDs with safety checks
- Implement a new "vitastor-cli rm-osd" command to only remove OSD metadata from etcd
- Fix a bug where the monitor could ignore OSD removal and other /osd/stats key changes
- Fix a bug where garbage could be returned when reading objects being written at the same time
- Fix a rare write stall where journal space could be not reclaimed where there
  were no new operations in the flush queue
- Fix a rare peering stall caused by a previous long listing operations queues limiting attempt
- Fix total object count statistic in OSD on object creation
- Add missing offset&len into vitastor-disk dump-journal for big_writes, fix JSON format
- Make vitastor-cli print help on missing command
- Make vitastor-cli translate all '-' to '_' in CLI options
2022-12-27 02:40:55 +03:00
Vitaliy Filippov 795020674d Loop journal flusher when the queue is empty but there is a trim request 2022-12-27 02:28:20 +03:00
Vitaliy Filippov 8e12285629 Fix vitastor-disk purge (now it works) 2022-12-27 02:28:20 +03:00
Vitaliy Filippov b9b50ab4cc Implement vitastor-disk purge command 2022-12-26 02:48:48 +03:00
Vitaliy Filippov 0d8625f92d Make vitastor-cli print help on missing command 2022-12-26 02:48:48 +03:00
Vitaliy Filippov 2f3c2c5140 Implement safety check for OSD removal, translate all '-' to '_' in cli options
'-' to '_' translation fixes a bug with create --image_meta
2022-12-26 02:48:48 +03:00
Vitaliy Filippov 4ebdd02b0f Remove LIST op limiter
It doesn't prevent OSD slow ops but may itself lead to stalls :)
2022-12-26 02:48:48 +03:00
Vitaliy Filippov c2244331e6 Add vitastor-cli rm-osd command 2022-12-26 02:48:48 +03:00
Vitaliy Filippov 31bd1ec145 Fix object creation check for statistics 2022-12-21 02:51:11 +03:00
Vitaliy Filippov c08d1f2dfe Add missing offset&len into big_writes journal dump, fix commas again 2022-12-21 02:51:11 +03:00
Vitaliy Filippov 1d80bcc8d0 Fix blockstore returning garbage for unstable reads if there is an in-flight version
"In-flight" versions are added into dirty_db when writes are enqueued. And they
weren't ignored by subsequent reads even though they didn't have data location yet.
This bug was leading to test_heal.sh not passing sometimes with replicated setups.
2022-12-21 02:48:24 +03:00
Vitaliy Filippov 5ef8bed75f Release 0.8.2
- Fix QEMU driver compatibility with QEMU 7.0 and < 2.9
- Add patches for pve-qemu-kvm 7.1 (PVE 7.3) and pve-qemu-kvm 6.2 (PVE 7.2)
- Fix Proxmox driver location in the pve-storage-vitastor package
- Disable HDD autodetection in non-hybrid mode
- Explicitly warn about a buggy kernels on -EAGAIN in io_uring
- Final fix for the lack of zeroing out of old metadata entries
  (do not crash with "big_write journal_entry was allocated over another object"
  in some cases after an unclean OSD shutdown)
- Wait for data writes before fsyncing data if data fsync is enabled
- Never try to wait for free space inside blockstore thus stalling OSDs
- Fix a rare crash in osd_peering due to callback ordering
- Fix a rare duplication of ping & op message IDs
- Fix a rare use-after-free during pings
- Add --force to vitastor-disk read-sb
- Make vitastor-disk dump metadata object IDs in hex, add forgotten commas
- Fix vitastor-disk SCSI disk cache check
2022-12-17 17:54:13 +03:00
Vitaliy Filippov 8669998e5e Fix discard_list_subop() for local ops 2022-12-17 17:54:13 +03:00
Vitaliy Filippov b457327e77 Oops. Fix metadata read after fixes :-) 2022-12-17 17:31:57 +03:00
Vitaliy Filippov f7fa9d5e34 Fix SCSI device cache type check 2022-12-17 17:31:57 +03:00
Vitaliy Filippov 49b88b01f9 Fix clang build 2022-12-17 16:25:26 +03:00
Vitaliy Filippov 71688bcb59 Disable HDD autodetection in non-hybrid mode 2022-12-17 16:12:15 +03:00
Vitaliy Filippov 552e207d2b Explicitly print errors about -EAGAIN in io_uring 2022-12-17 15:49:49 +03:00
Vitaliy Filippov 5464821fa5 Final fix for the lack of zeroing out of old metadata entries
If a crash occurs during flushing a redirect-write it may happen so that
the disk contains both old and new metadata entries. This is OK, but prior
to 0.8.0 after this situation OSDs started without problem, but then they
crashed after some more overwrites with a "tried to overwrite non-zero
metadata entry" error. 0.8.0 introduced a change that was intended to fix
this situation, but rather than fixing it it prevented OSDs from starting,
now because of a "big_write journal_entry was allocated over another object"
error... :-)

This change finally fixes the original issue.

Followup to 54ef2c389f
2022-12-17 14:50:31 +03:00
Vitaliy Filippov 6917a32ca8 Add --force to vitastor-disk read-sb 2022-12-17 02:47:15 +03:00
Vitaliy Filippov f8722a8bd5 Dump meta in hex 2022-12-17 01:50:38 +03:00
Vitaliy Filippov 9c2f69c9fa Add forgotten commas to vitastor-disk dump-journal 2022-12-17 01:22:58 +03:00
Vitaliy Filippov 1a93e3f33a Wait for data writes before fsyncing data if data fsync is enabled 2022-12-16 20:46:55 +03:00
Vitaliy Filippov 3f35744052 Fix compatibility with QEMU aio_set_fd_handler signatures in 7.0 and < 2.9 2022-12-15 19:17:17 +03:00
Vitaliy Filippov cb437913d3 Never try to wait for free space inside blockstore 2022-12-12 00:27:05 +03:00
Vitaliy Filippov 472bce58ab Fix rare crash in osd_peering due to callback ordering 2022-12-12 00:27:05 +03:00
Vitaliy Filippov 7a71e7ef01 Fix possible duplication of ping & op message IDs 2022-12-04 00:16:47 +03:00
Vitaliy Filippov c71e5e7bbd Fix possible use-after-free during pings 2022-12-04 00:16:47 +03:00
Vitaliy Filippov 8fdf30b21f Release 0.8.1
- Remove an additional data copy operation when flushing journal (should
  slightly increase write performance)
- Fix a bug where new writes in the inmemory_journal=false mode could overwrite
  the data currently read by a parallel read operation
- Fix degraded parity writes for EC N+K when K>1 where the bug could also lead
  to an "assertion failed" error
- Fix missing journal space check for "big" writes which could lead to
  "prefill_single_journal_entry(): assertion failed..." error in OSD
- Fix possible "assertion failed: next->prev_wait >= 0" in client in rare cases
- Fix missing "len" field in vitastor-disk write-journal big_writes
- Fix possible crash of a full OSD (ENOSPC)
- Fix CSI build scripts to include newest packages every time
- Fix CSI endpoint in the liveness probe manifest
2022-11-20 11:44:09 +03:00