Commit Graph

74 Commits (5596ad89978e3be27c536ac975a598626b3f3310)

Author SHA1 Message Date
Vitaliy Filippov faa5e1436f Attempt journal trim even without new flushes
This is the new guaranteed unblocking method which replaces old trims
in init and rollback, and also fixes a possible stall when just several
writes in the beginning of the journal are flushed without triggering
a subsequent trim.
2020-10-24 01:28:47 +03:00
Vitaliy Filippov 5fbe36198a Fix journal trimming
1) Update journal's used_start in memory only after updating journal superblock.
Doing the opposite is incorrect because part of the journal will be lost if writers
overwrite its old beginning.

2) Sync journal device after updating the superblock.

3) Do not trim in rollback and init because trimming there would also require
updating the superblock. And the only reason to trim in both those places was
to unblock writers. And a guaranteed unblocking method will follow in the next
commit :)
2020-10-24 01:08:33 +03:00
Vitaliy Filippov 99c45bb5ed Fix debugging output during journal loading 2020-10-24 01:08:33 +03:00
Vitaliy Filippov 0471b09b9c Add license notices to all source code files 2020-09-17 23:07:06 +03:00
Vitaliy Filippov 44973e7f27 Fix replicated pool bugs 2020-09-05 21:45:04 +03:00
Vitaliy Filippov 242d9a42a2 Change object format in prints to %lx:%lx v%lu 2020-09-05 17:44:05 +03:00
Vitaliy Filippov e051db5a73 Check for unsuccessful memory allocations 2020-09-05 01:42:11 +03:00
Vitaliy Filippov ec7acc8f3a Add WRITE_STABLE operation for future replication support 2020-07-05 01:48:02 +03:00
Vitaliy Filippov 416a80b099 Make blockstore object state a combination of type and workflow 2020-07-04 22:20:32 +03:00
Vitaliy Filippov 571be0f380 Make deletions instantly stable
"2-phase" (write->stabilize) process is pointless for deletions because it
doesn't protect us from incomplete objects. This happens because it removes
the version information from metadata after stabilization. Deletions require
"3-phase" process with a potentially very long 3rd phase.

So, deletions will be allowed to generate degraded and incomplete objects,
and for it to not affect users' ability to delete something, the cluster
will allow to delete whole inodes while storing a list of them in etcd.
Proper TRIM will be impossible until the implementation of the aforementioned
"3-phase" process, though.

By the way, this change also fixes a possible write stall after rebalancing
which was caused by the lack of "stabilize delete" operations.
2020-06-02 23:45:22 +03:00
Vitaliy Filippov 985c309d7f Remove duplicate code between blockstore_{rollback,stable} and blockstore_init 2020-06-02 20:37:00 +03:00
Vitaliy Filippov 165c204555 Fix BS_OP_DELETE (the implementation was untested up to this point) 2020-06-02 14:26:01 +03:00
Vitaliy Filippov c22e096943 Output journal offsets in debug trace in hex, add detailed "still waiting" messages 2020-06-01 16:18:19 +03:00
Vitaliy Filippov 21d0b06959 Implement flushing (stabilize/rollback) of unstable entries on start of the PG 2020-03-14 02:49:34 +03:00
Vitaliy Filippov eba053febe Do not start small writes before finishing the last big write to the same object 2020-03-12 02:15:01 +03:00
Vitaliy Filippov 3f522c66e6 Implement immediate commit mode 2020-03-10 01:59:15 +03:00
Vitaliy Filippov c863543bfe Fix possible journal corruption caused by concurrent flushing and writing of the same journal sector 2020-03-08 01:21:19 +03:00
Vitaliy Filippov 844cacd357 Allow incorrectly forbidden BS_OP_LIST in readonly mode 2020-03-06 02:29:39 +03:00
Vitaliy Filippov 1733de2db6 Test & fix single-PG primary OSD
- Add support for benchmarking single primary OSD in fio_sec_osd
- Do not wait for the next event in flushers (return resume_0 back)
- Fix flushing of zero-length writes
- Print PG object count when peering
- Print journal free space when starting and when congested
2020-02-26 19:05:29 +03:00
Vitaliy Filippov 74673c761f Make basic primary-write work 2020-02-25 02:55:58 +03:00
Vitaliy Filippov dcc9e75c63 Wait for write completion before fsync in blockstore_init 2020-01-29 16:40:21 +03:00
Vitaliy Filippov 47663bd1dc Add (empty) osd_primary.cpp, rename osd_read to osd_receive, add FIXMEs for fsync 2020-01-28 22:40:50 +03:00
Vitaliy Filippov 2b09710d6f Implement blockstore rollback operation
Rollback operation is required for the primary OSD to kill unstable
object versions in OSD peers so they don't occupy journal space
2020-01-24 20:18:14 +03:00
Vitaliy Filippov d0ab2a20b2 Make fsync flags separate for data, metadata and journal 2020-01-17 13:41:37 +03:00
Vitaliy Filippov 43f6cfeb73 Extract alignments to options 2020-01-16 00:54:25 +03:00
Vitaliy Filippov 36d8c8724f Fix sparse reads using bitmap, fix journal replay (we could sometimes lose its end) 2020-01-12 23:38:33 +03:00
Vitaliy Filippov cf819eb442 Implement sparse block bitmap to avoid zero-fill 2020-01-12 02:55:32 +03:00
Vitaliy Filippov bf3eecc159 Extract 512 to constants 2020-01-06 14:11:47 +03:00
Vitaliy Filippov a7e74670a5 Split blockstore implementation and interface header 2019-12-15 14:57:18 +03:00
Vitaliy Filippov 76caecf7c7 Inmemory metadata mode 2019-12-02 15:42:42 +03:00
Vitaliy Filippov f4d06ba102 OP_DELETE flushing 2019-12-02 02:41:14 +03:00
Vitaliy Filippov 00eeedae90 Add "fsync disabled" mode 2019-12-01 16:41:07 +03:00
Vitaliy Filippov 76655929c4 Add readonly flag 2019-12-01 16:41:07 +03:00
Vitaliy Filippov 9260cd263a Verify data crc32 when reading journal 2019-11-30 23:32:10 +03:00
Vitaliy Filippov 2039df76a5 Fix journal reading and make it more similar to writing :) 2019-11-30 02:27:31 +03:00
Vitaliy Filippov 40781c67b2 Trim journal on start 2019-11-29 02:13:32 +03:00
Vitaliy Filippov 45f34fb3b2 Fix linear overwrite, make metadata writes ordered, ignore older entries when recovering journal 2019-11-28 22:36:38 +03:00
Vitaliy Filippov b6fff5a77e Fix metadata area size calculation, print free space, wait for free space
FIXME: Now it crashes with -ENOSPC on linear overwrite
2019-11-28 20:23:27 +03:00
Vitaliy Filippov 9fa0d3325f Support inmemory journal 2019-11-28 18:06:50 +03:00
Vitaliy Filippov e1ac4dba23 Fix safe stop procedure 2019-11-28 02:27:17 +03:00
Vitaliy Filippov 35a6ed728d Fix another stall due to bad unstable_writes tracking, do not try to write beyond the end of the journal 2019-11-28 00:28:08 +03:00
Vitaliy Filippov 2630e2e3b9 Fix metadata partition length, fix journal allocation at the end 2019-11-27 19:39:18 +03:00
Vitaliy Filippov 876231d26b no new 2019-11-27 18:14:01 +03:00
Vitaliy Filippov ce5cd13bc8 Use fdatasync (just for testing over an FS) 2019-11-27 02:41:30 +03:00
Vitaliy Filippov ff7469ee91 Make allocator a class 2019-11-27 00:50:57 +03:00
Vitaliy Filippov a6770f619a Fix crash while reading metadata 2019-11-26 12:06:42 +03:00
Vitaliy Filippov 50cf3667fa Track unstable writes 2019-11-25 01:16:34 +03:00
Vitaliy Filippov 82a2b8e7d9 Fix some extra bugs and it seems now it is even able to trim the journal 2019-11-22 12:08:44 +03:00
Vitaliy Filippov 7e87290fca Clear second sector of the journal, init iov for callbacks 2019-11-21 22:06:00 +03:00
Vitaliy Filippov 201eeb8516 Rewrite metadata_init to the same "goto-coroutine" style 2019-11-21 21:51:52 +03:00