Vitaliy Filippov
1a4ceb420d
Track used blocks, not object versions
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
4181add1f4
Remove creepy "metadata copying" during overwrite
...
Instead of it, just do not verify checksums of currently mutated objects.
When clean data modification during flush runs in parallel to a read request,
that request may read a mix of old and new data. It may even read a mix of
multiple flushed versions if it lasts too long... And attempts to verify it
using temporary copies of metadata make the algorithm too complex and creepy.
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
a8464c19af
Support keeping checksums on disk (not in memory)
...
Definitely beneficial for SSD+HDD setups
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
7bfb1639ea
Use find_holes() in flusher for unification
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
9357e5293e
Call fill_partial_checksum_blocks() correctly in regard to COPY_BUF_CSUM_FILL
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
12851dc07d
Wait for journal reads before checking them in clear_incomplete_csum_block_bits
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
71674d00cf
Fix journal data checksum mangling on corrupted block overwrite
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
c5274f655b
...and partially remove the perversion with bitmap inlining
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
45e07d6294
Sadly we have to refcount dyn_data...
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
874a766b62
Rename meta_version to meta_format
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
e42975ffd1
Fix wait_journal_count not being zeroed
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
92c6e16eba
Fix checksum verification in big_write journal reads
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
213a9ccb4d
Verify checksums during journal reads
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
a166147110
Add backwards compatibility with non-checksum metadata and journal formats
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
7d532880c3
Implement large csum_block_size support (more than 4k) + refactor blockstore_flush
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
0b0405d115
Implement bitmap-granular (4k) metadata & data checksums
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
b7e4d0c9bf
Fix journal dirty_start position tracking and some debug prints
...
Fixes two bugs found during HDD testing :-)
1) OSD crashed with "BUG: Attempt to overwrite used offset of the journal" during
`fio -bs=900k -iodepth=128` test with 16 MB journal
2) OSD stalled during `fio -bs=512k -iodepth=128` test with 64 MB journal
2023-07-09 01:17:55 +03:00
Vitaliy Filippov
86b4682975
Put get_trim_pos into the "critical section". Fixes rare journal corruption issue
...
The consequence of this issue was that in some very rare cases (only reproduced
under load in CI when running 4+ tests in parallel) small write data written to
journal could overwrite journal entries.
Also add an assert-type safety check to be able to catch this issue in the
future again in case of a regression.
2023-06-17 00:06:42 +03:00
Vitaliy Filippov
f9fbea25a4
Remove double write when old and new locations are in the same metadata block
...
Also add another metadata entry fool-safety check which, ideally, will never fire %)
2023-06-03 00:47:10 +03:00
Vitaliy Filippov
b74ccb613c
Fix another variant of flusher sync-waiting stall
2023-04-24 00:44:41 +03:00
Vitaliy Filippov
d7bd36dc32
Fix another rare journal flush stall
2022-12-30 02:03:33 +03:00
Vitaliy Filippov
795020674d
Loop journal flusher when the queue is empty but there is a trim request
2022-12-27 02:28:20 +03:00
Vitaliy Filippov
49b88b01f9
Fix clang build
2022-12-17 16:25:26 +03:00
Vitaliy Filippov
552e207d2b
Explicitly print errors about -EAGAIN in io_uring
2022-12-17 15:49:49 +03:00
Vitaliy Filippov
1a93e3f33a
Wait for data writes before fsyncing data if data fsync is enabled
2022-12-16 20:46:55 +03:00
Vitaliy Filippov
a276a1f737
Do not copy journal data additional time when flushing
2022-11-20 00:50:13 +03:00
Vitaliy Filippov
ea632367e9
Do not alter dsk.meta_offset/len to skip superblock
2022-07-15 01:38:30 +03:00
Vitaliy Filippov
dfd80626bd
Extract disk opening functions to separate module
2022-07-15 01:38:30 +03:00
Vitaliy Filippov
839ec9e6e0
Shard clean_db by PGs to speedup listings
2022-02-20 00:21:24 +03:00
Vitaliy Filippov
7bdd92ca4f
Fix build under clang and some warnings
...
Build problems fixed:
- void* pointer arithmetic which is a GNU extension (works as byte*)
- "variable size object may not be initialized" which is OK under GCC
- nullptr_t related error in json11 (it lacks 'operator <' in clang)
Warnings fixed:
- empty nested struct initializer { 0 } replaced by {}
- removed several unused lambda captures
2022-01-16 00:02:54 +03:00
Vitaliy Filippov
c6d104ecd6
Print object version on fatal overwrite
2021-12-14 01:57:04 +03:00
Vitaliy Filippov
8398ad0117
Fix #36 - Fix old version data sometimes overriding new version data
...
Reproduction case:
- v3 = (offset 4kb, length 16kb)
- v2 = (offset 24kb, length 16kb)
- v1 = (offset 16kb, length 16kb)
- At the third step it was inserting 16..24kb instead of 20..24kb
2021-11-27 01:17:45 +03:00
Vitaliy Filippov
28bd94d2c2
Make diagnostics slightly better
2021-07-18 01:24:38 +03:00
Vitaliy Filippov
148ff04aa8
Do not lose flusher queue entries when an "older object rescan" happens in parallel with flushing of an older version of another object
2021-07-18 01:20:54 +03:00
Vitaliy Filippov
e74af9745e
Print journal flusher diagnostics on slow ops
2021-07-17 16:13:41 +03:00
Vitaliy Filippov
f684d9101a
Refuse to start with old journal version
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
ab39ce2bbb
Use clean_entry_bitmap_size instead of entry_attr_size back because of changed bitmap handling
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6107a4d07b
Add "external" bitmap support to blockstore
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
95c29b9dc3
Add "external" bitmap support to osd_rmw
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6909807068
Allow to start the OSD just to flush the journal completely
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
52097c4856
Stop flushing when less than min_flusher_count operations are available (unless a trim is forced)
2021-04-03 00:53:28 +03:00
Vitaliy Filippov
8f8b90be7a
Add min_flusher_count configuration
2021-04-03 00:53:28 +03:00
Vitaliy Filippov
843b7052d2
Add an assertion when clearing deleted metadata entries, add debug details when freeing blocks
2021-04-03 00:53:28 +03:00
Vitaliy Filippov
564d64e271
Add some details for debug prints
2021-03-25 11:00:10 +03:00
Vitaliy Filippov
06f4978085
Fix fsync check in blockstore_flush (data fsyncs were disabled instead of journal fsyncs)
2021-03-25 02:41:58 +03:00
Vitaliy Filippov
b0ad1e1e6d
Remember writes as "unsynced" only after completing them
...
Previously BS_OP_SYNC could take unfinished writes and add them into the journal before
they were actually completed. This was leading to crashes with the message
"BUG: Unexpected dirty_entry 2000000000001:9f2a0000 v3 unstable state during flush: 338"
2021-03-25 02:41:58 +03:00
Vitaliy Filippov
88671cf745
Fix a bug causing all flushers to wait for an fsync without actually trying to do it
...
This happened because flusher_count became dynamic and fsync_batch() was comparing the number
of flushers currently ready to do an fsync with the maximum number of flushers. Also the number
wasn't rechecked on every loop which was also incorrect.
Now the interrupted_rebalance test passes even without IMMEDIATE_COMMIT=1.
2021-03-13 17:27:29 +03:00
Vitaliy Filippov
c974cb539c
Make flusher_count adaptive and limit write iodepth
2021-02-25 23:59:33 +03:00
Vitaliy Filippov
bf9a175efc
Move C/C++ sources to src subdirectory
2021-02-25 23:59:03 +03:00