Vitaliy Filippov
cf36445359
Reserve journal space for stabilize requests dynamically to prevent stalls
2023-11-20 03:01:57 +03:00
Vitaliy Filippov
f600cc07b0
Autosync in blockstore every autosync_writes, too
2023-09-16 17:52:17 +03:00
Vitaliy Filippov
5cadb170b9
Fix possible OSD crash during sync due to missing min_flushed_journal_sector reset
2023-09-16 17:52:17 +03:00
Vitaliy Filippov
e72d4ed1d4
Remove unused bs_sync fields
2023-09-16 17:52:17 +03:00
Vitaliy Filippov
1a4ceb420d
Track used blocks, not object versions
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
4181add1f4
Remove creepy "metadata copying" during overwrite
...
Instead of it, just do not verify checksums of currently mutated objects.
When clean data modification during flush runs in parallel to a read request,
that request may read a mix of old and new data. It may even read a mix of
multiple flushed versions if it lasts too long... And attempts to verify it
using temporary copies of metadata make the algorithm too complex and creepy.
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
a8464c19af
Support keeping checksums on disk (not in memory)
...
Definitely beneficial for SSD+HDD setups
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
c5274f655b
...and partially remove the perversion with bitmap inlining
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
45e07d6294
Sadly we have to refcount dyn_data...
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
874a766b62
Rename meta_version to meta_format
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
430994f48a
Fix journal big_write simple reads after checksum changes
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
92c6e16eba
Fix checksum verification in big_write journal reads
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
213a9ccb4d
Verify checksums during journal reads
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
7d532880c3
Implement large csum_block_size support (more than 4k) + refactor blockstore_flush
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
0b0405d115
Implement bitmap-granular (4k) metadata & data checksums
2023-07-29 12:17:18 +03:00
Vitaliy Filippov
a409598b16
Wait for free space again, but count on big_write flushes instead of just flusher activity
2023-05-10 01:51:02 +03:00
Vitaliy Filippov
0fbf4c6a08
Selectively sync nonsynced objects on STABILIZE/ROLLBACK (fix for github issue #51 )
2023-04-08 02:44:02 +03:00
Vitaliy Filippov
d06ed2b0e7
Implement online config update
2023-03-26 19:21:50 +03:00
Vitaliy Filippov
4ebdd02b0f
Remove LIST op limiter
...
It doesn't prevent OSD slow ops but may itself lead to stalls :)
2022-12-26 02:48:48 +03:00
Vitaliy Filippov
552e207d2b
Explicitly print errors about -EAGAIN in io_uring
2022-12-17 15:49:49 +03:00
Vitaliy Filippov
cb437913d3
Never try to wait for free space inside blockstore
2022-12-12 00:27:05 +03:00
Vitaliy Filippov
238037ae31
Make journal trimmer wait until reads are completed when inmemory_journal is false
...
Without this new writes may in theory overwrite journal data being read at that time
2022-11-20 01:49:21 +03:00
Vitaliy Filippov
dfd80626bd
Extract disk opening functions to separate module
2022-07-15 01:38:30 +03:00
Vitaliy Filippov
30907852c2
Use simple std::map for the config
2022-07-15 01:38:30 +03:00
Vitaliy Filippov
73a363bf92
Rename some variables and constants
2022-07-15 01:38:30 +03:00
Vitaliy Filippov
bce357e2a5
Do not read all metadata into memory when dumping
2022-06-13 01:26:30 +03:00
Vitaliy Filippov
839ec9e6e0
Shard clean_db by PGs to speedup listings
2022-02-20 00:21:24 +03:00
Vitaliy Filippov
36c276358b
Attempt to fix "head-of-line blocking" by LIST operations
2022-02-18 01:31:45 +03:00
Vitaliy Filippov
df0cd85352
Fix another part of the "async sqe clear" bug (followup to d9857a5340
)
2022-02-01 01:14:56 +03:00
Vitaliy Filippov
d9857a5340
Check for SQEs, not for completions
...
Should finally fix Assertion `sqe != NULL' failed introduced after journaling
refactor in 0.6.11...
2022-01-31 02:19:10 +03:00
Vitaliy Filippov
7bdd92ca4f
Fix build under clang and some warnings
...
Build problems fixed:
- void* pointer arithmetic which is a GNU extension (works as byte*)
- "variable size object may not be initialized" which is OK under GCC
- nullptr_t related error in json11 (it lacks 'operator <' in clang)
Warnings fixed:
- empty nested struct initializer { 0 } replaced by {}
- removed several unused lambda captures
2022-01-16 00:02:54 +03:00
Vitaliy Filippov
f93491bc6c
Implement journal write batching and slightly refactor journal writes
...
Slightly reduces WA. For example, in 4K T1Q128 replicated randwrite tests
WA is reduced from ~3.6 to ~3.1, in T1Q64 from ~3.8 to ~3.4.
Only effective without no_same_sector_overwrites.
2021-12-16 00:27:17 +03:00
Vitaliy Filippov
2c7556e536
Allow to run with 4k sector size. Natural, but it was forbidden
2021-12-11 22:03:16 +00:00
Vitaliy Filippov
cfe8de9b84
Autosync based on number of unstable ops to prevent journal stalls
2021-10-30 14:26:48 +03:00
Vitaliy Filippov
e74af9745e
Print journal flusher diagnostics on slow ops
2021-07-17 16:13:41 +03:00
Vitaliy Filippov
2a02f3c4c7
Add metadata superblock and check it on start
...
Refuse to start if the superblock is missing or bad version;
zero out the metadata area when initializing superblock.
2021-04-10 22:26:17 +03:00
Vitaliy Filippov
2ab423d4ef
Implement journaled write throttling for the SSD+HDD case
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
f01eea07d3
Add simplified interface to read blockstore bitmaps synchronously
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
ab39ce2bbb
Use clean_entry_bitmap_size instead of entry_attr_size back because of changed bitmap handling
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
4ae1b84c67
Report inode space usage statistics to etcd, aggregate it in the monitor
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
c35963967f
Add inode space usage statistics tracking to blockstore
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6107a4d07b
Add "external" bitmap support to blockstore
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
307c1731c1
Forget all dirty_entries before stable big_write or delete during initialisation
...
This fixes a 'double_alloc' assertion in the following case:
- big_write object #1 v1 to block #100
- big_write object #1 v2 to block #101
- big_write object #2 v1 to block #100
2021-04-07 01:30:38 +03:00
Vitaliy Filippov
54f2353f24
Use bitmap granularity for alignment checks
2021-04-03 14:36:04 +03:00
Vitaliy Filippov
8f8b90be7a
Add min_flusher_count configuration
2021-04-03 00:53:28 +03:00
Vitaliy Filippov
b0ad1e1e6d
Remember writes as "unsynced" only after completing them
...
Previously BS_OP_SYNC could take unfinished writes and add them into the journal before
they were actually completed. This was leading to crashes with the message
"BUG: Unexpected dirty_entry 2000000000001:9f2a0000 v3 unstable state during flush: 338"
2021-03-25 02:41:58 +03:00
Vitaliy Filippov
36c935ace6
Use std::vector for the blockstore submission queue
2021-03-08 17:04:10 +03:00
Vitaliy Filippov
0d8b5e2ef9
Remove unused enqueue_op_first()
2021-03-08 17:04:10 +03:00
Vitaliy Filippov
98f1e2c277
Rework write/sync ordering
...
Make syncs wait for all previous writes because it's the only way
to make sure that OSDs do not receive incomplete writes in LIST results
during peering when some writes are still in progress.
Also simplify blockstore submission queue logic.
2021-03-08 17:04:10 +03:00
Vitaliy Filippov
314b20437b
Do not break subsequent small writes badly when a big write is canceled
2021-03-08 17:04:10 +03:00