Commit Graph

24 Commits (lrc-matrix)

Author SHA1 Message Date
Vitaliy Filippov e950c024d3 Do not sync peer OSDs before listing
Sync before listing was added to wait for all PG writes possibly left in queue
from the previous master to finish before listing it

But in fact it may block the cluster when EC is used and some unstable writes
are left in the queue - they block journal flushing, rollback/stabilize is
required to unblock them, but rollback/stabilize may only happen after PG is
peered. But peering needs listings, listings are requested only after sync, and
sync itself waits for currently blocked writes waiting in the queue
2023-01-03 00:05:45 +03:00
Vitaliy Filippov 71d6d9f868 Fix possible crash on ENOSPC during operation cancel in blockstore 2023-01-03 00:05:45 +03:00
Vitaliy Filippov 4ebdd02b0f Remove LIST op limiter
It doesn't prevent OSD slow ops but may itself lead to stalls :)
2022-12-26 02:48:48 +03:00
Vitaliy Filippov 552e207d2b Explicitly print errors about -EAGAIN in io_uring 2022-12-17 15:49:49 +03:00
Vitaliy Filippov cb437913d3 Never try to wait for free space inside blockstore 2022-12-12 00:27:05 +03:00
Vitaliy Filippov dfd80626bd Extract disk opening functions to separate module 2022-07-15 01:38:30 +03:00
Vitaliy Filippov 73a363bf92 Rename some variables and constants 2022-07-15 01:38:30 +03:00
Vitaliy Filippov 839ec9e6e0 Shard clean_db by PGs to speedup listings 2022-02-20 00:21:24 +03:00
Vitaliy Filippov 36c276358b Attempt to fix "head-of-line blocking" by LIST operations 2022-02-18 01:31:45 +03:00
Vitaliy Filippov df0cd85352 Fix another part of the "async sqe clear" bug (followup to d9857a5340) 2022-02-01 01:14:56 +03:00
Vitaliy Filippov 7bdd92ca4f Fix build under clang and some warnings
Build problems fixed:
- void* pointer arithmetic which is a GNU extension (works as byte*)
- "variable size object may not be initialized" which is OK under GCC
- nullptr_t related error in json11 (it lacks 'operator <' in clang)

Warnings fixed:
- empty nested struct initializer { 0 } replaced by {}
- removed several unused lambda captures
2022-01-16 00:02:54 +03:00
Vitaliy Filippov f93491bc6c Implement journal write batching and slightly refactor journal writes
Slightly reduces WA. For example, in 4K T1Q128 replicated randwrite tests
WA is reduced from ~3.6 to ~3.1, in T1Q64 from ~3.8 to ~3.4.

Only effective without no_same_sector_overwrites.
2021-12-16 00:27:17 +03:00
Vitaliy Filippov e74af9745e Print journal flusher diagnostics on slow ops 2021-07-17 16:13:41 +03:00
Vitaliy Filippov 2ab423d4ef Implement journaled write throttling for the SSD+HDD case 2021-04-10 17:44:12 +03:00
Vitaliy Filippov d6524670e1 Introduce data distribution locality 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 6909807068 Allow to start the OSD just to flush the journal completely 2021-04-10 17:44:12 +03:00
Vitaliy Filippov 54f2353f24 Use bitmap granularity for alignment checks 2021-04-03 14:36:04 +03:00
Vitaliy Filippov 8f8b90be7a Add min_flusher_count configuration 2021-04-03 00:53:28 +03:00
Vitaliy Filippov c5fb1d5987 Do not duplicate blockstore operations when io_uring fills up
This bug was leading to OSDs dying with "Assertion `fulfilled == read_op->len' failed"
when testing fio -rw=randread -numjobs=8 -iodepth=128
2021-03-16 12:48:26 +03:00
Vitaliy Filippov d1526b415f Correctly resume writes when OSD is full to return an error 2021-03-13 17:19:45 +03:00
Vitaliy Filippov 36c935ace6 Use std::vector for the blockstore submission queue 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 0d8b5e2ef9 Remove unused enqueue_op_first() 2021-03-08 17:04:10 +03:00
Vitaliy Filippov 98f1e2c277 Rework write/sync ordering
Make syncs wait for all previous writes because it's the only way
to make sure that OSDs do not receive incomplete writes in LIST results
during peering when some writes are still in progress.

Also simplify blockstore submission queue logic.
2021-03-08 17:04:10 +03:00
Vitaliy Filippov bf9a175efc Move C/C++ sources to src subdirectory 2021-02-25 23:59:03 +03:00