Commit Graph

85 Commits (2d9f09dcb686e6dabc27026b0b8571ff72afe1a0)

Author SHA1 Message Date
Vitaliy Filippov e1e01d042e Rename sector_info.usage_count to flush_count 2021-02-02 01:32:23 +03:00
Vitaliy Filippov 9b5d8b9ad4 Fix multiple-sector journal writes, add assertions to not miss any SQEs 2021-02-02 01:29:11 +03:00
Vitaliy Filippov 1018764c91 Fix write->delete->write bugs, add & fix some debugging output 2020-12-04 23:21:58 +03:00
Vitaliy Filippov 44656fbf67 Allow writes with low version numbers after a delete 2020-12-04 11:54:41 +03:00
Vitaliy Filippov 089f138e0c Allow situations where the journal contains a big_write(v1) after delete(v2) and v1 < v2
Fixes a crash in the following scenario:
- client issues a delete request (object version is at least 2)
- OSD has time to flush it to the metadata, but doesn't have time to move the journal start pointer on disk
- client overwrites the same object and it gets the version number 1 again
- OSD is restarted and sees delete(v=2), big_write(v=1) in the journal
- dirty_db sequence gets broken and OSD crashes with assert("Writes and deletes shouldn't happen at the same time")
2020-12-04 11:47:27 +03:00
Vitaliy Filippov f011e0c675 Do not block stabilize by list and list by write 2020-10-22 22:13:40 +00:00
Vitaliy Filippov 0471b09b9c Add license notices to all source code files 2020-09-17 23:07:06 +03:00
Vitaliy Filippov 53832d184a Allow to use lazy sync with replicated pools 2020-09-06 12:08:44 +03:00
Vitaliy Filippov 44973e7f27 Fix replicated pool bugs 2020-09-05 21:45:04 +03:00
Vitaliy Filippov 242d9a42a2 Change object format in prints to %lx:%lx v%lu 2020-09-05 17:44:05 +03:00
Vitaliy Filippov ec7acc8f3a Add WRITE_STABLE operation for future replication support 2020-07-05 01:48:02 +03:00
Vitaliy Filippov 416a80b099 Make blockstore object state a combination of type and workflow 2020-07-04 22:20:32 +03:00
Vitaliy Filippov 571be0f380 Make deletions instantly stable
"2-phase" (write->stabilize) process is pointless for deletions because it
doesn't protect us from incomplete objects. This happens because it removes
the version information from metadata after stabilization. Deletions require
"3-phase" process with a potentially very long 3rd phase.

So, deletions will be allowed to generate degraded and incomplete objects,
and for it to not affect users' ability to delete something, the cluster
will allow to delete whole inodes while storing a list of them in etcd.
Proper TRIM will be impossible until the implementation of the aforementioned
"3-phase" process, though.

By the way, this change also fixes a possible write stall after rebalancing
which was caused by the lack of "stabilize delete" operations.
2020-06-02 23:45:22 +03:00
Vitaliy Filippov 165c204555 Fix BS_OP_DELETE (the implementation was untested up to this point) 2020-06-02 14:26:01 +03:00
Vitaliy Filippov e6a4b634f8 Fix possible write stall
The stall occurred during fio Q=128 random write tests with low flusher_count (4).
It was caused by flushers being unable to flush the beginning of the journal
because it contained older writes to an object that also had writes in the very end
of the journal, after dirty_start.
2020-06-01 16:18:23 +03:00
Vitaliy Filippov c22e096943 Output journal offsets in debug trace in hex, add detailed "still waiting" messages 2020-06-01 16:18:19 +03:00
Vitaliy Filippov 0f43f6d3f6 Fix crashes, print some stats
- fix the `delete op` inside lambda callback crash (it frees the lambda itself
  which results in use-after-free with g++)
- fix stop_client() reenterability
- fix a bug in the blockstore layer which resulted in always returning version=0
  for zero-length reads
- change error codes for blockstore_stabilize
2020-03-31 17:55:31 +03:00
Vitaliy Filippov 46f9bd2a69 Make blockstore list operation return consistent snapshots 2020-03-14 02:10:25 +03:00
Vitaliy Filippov 6982fe1255 Do not block reads by previous unfinished writes 2020-03-13 21:28:49 +03:00
Vitaliy Filippov eba053febe Do not start small writes before finishing the last big write to the same object 2020-03-12 02:15:01 +03:00
Vitaliy Filippov 3f522c66e6 Implement immediate commit mode 2020-03-10 01:59:15 +03:00
Vitaliy Filippov c863543bfe Fix possible journal corruption caused by concurrent flushing and writing of the same journal sector 2020-03-08 01:21:19 +03:00
Vitaliy Filippov 1696446545 Rename min/max _used to _flushed 2020-03-07 16:41:58 +03:00
Vitaliy Filippov 41dddddbf2 Fix some logging 2020-03-07 16:41:53 +03:00
Vitaliy Filippov c71b67f2f7 Move SYNC_STAB_ALL into blockstore implementation 2020-02-23 23:43:57 +03:00
Vitaliy Filippov ffe073473a Remove hardcode of the EC(2+1) scheme, now it supports EC(k+1), fix some bugs 2020-02-13 19:13:17 +03:00
Vitaliy Filippov 2b09710d6f Implement blockstore rollback operation
Rollback operation is required for the primary OSD to kill unstable
object versions in OSD peers so they don't occupy journal space
2020-01-24 20:18:14 +03:00
Vitaliy Filippov 43f6cfeb73 Extract alignments to options 2020-01-16 00:54:25 +03:00
Vitaliy Filippov a3d3949dce Do not overwrite same journal sector multiple times
It doesn't reduce actual WA, but it reduces tail latency (Q=32, 10% / 50% / 90% / 99% / 99.95%):
- write: 766us/979us/1090us/1303us/1729us vs 1074us/1450us/2212us/3261us/4113us
- sync: 701us/881us/1188us/1762us/2540us vs 269us/955us/1663us/2638us/4146us
2020-01-15 02:53:01 +03:00
Vitaliy Filippov 111516381f Add FIXME 2020-01-14 18:41:56 +03:00
Vitaliy Filippov cf819eb442 Implement sparse block bitmap to avoid zero-fill 2020-01-12 02:55:32 +03:00
Vitaliy Filippov 4b05bde3a2 Block writes earlier than sync/stabilize would be blocked, too 2020-01-10 20:05:17 +03:00
Vitaliy Filippov b3f2102f33 Add queue stall tracking 2020-01-10 01:23:46 +03:00
Vitaliy Filippov ba23824561 Allow to disable zero fill 2020-01-06 21:02:15 +03:00
Vitaliy Filippov bf3eecc159 Extract 512 to constants 2020-01-06 14:11:47 +03:00
Vitaliy Filippov 4677ace4cc Allow zero-length overwrites 2019-12-21 19:04:36 +03:00
Vitaliy Filippov e8f7905e08 Allow to set write/delete version explicitly 2019-12-19 19:17:54 +03:00
Vitaliy Filippov d3d21e6e0f Rename OP_ to BS_OP_ 2019-12-19 13:56:26 +03:00
Vitaliy Filippov a7e74670a5 Split blockstore implementation and interface header 2019-12-15 14:57:18 +03:00
Vitaliy Filippov 749ab6e2c6 Rename blockstore_operation to blockstore_op_t 2019-12-15 14:57:18 +03:00
Vitaliy Filippov a7a0946ba8 WIP OP_DELETE 2019-12-01 17:25:59 +03:00
Vitaliy Filippov 45f34fb3b2 Fix linear overwrite, make metadata writes ordered, ignore older entries when recovering journal 2019-11-28 22:36:38 +03:00
Vitaliy Filippov b6fff5a77e Fix metadata area size calculation, print free space, wait for free space
FIXME: Now it crashes with -ENOSPC on linear overwrite
2019-11-28 20:23:27 +03:00
Vitaliy Filippov 9fa0d3325f Support inmemory journal 2019-11-28 18:06:50 +03:00
Vitaliy Filippov d56cb290ee Two FIXMEs 2019-11-28 01:00:22 +03:00
Vitaliy Filippov 35a6ed728d Fix another stall due to bad unstable_writes tracking, do not try to write beyond the end of the journal 2019-11-28 00:28:08 +03:00
Vitaliy Filippov 2630e2e3b9 Fix metadata partition length, fix journal allocation at the end 2019-11-27 19:39:18 +03:00
Vitaliy Filippov 9ba243b3ee Add debug prints 2019-11-27 18:07:51 +03:00
Vitaliy Filippov e2b91968c5 Fix sync confirmations and some pipeline-stall bugs 2019-11-27 18:07:38 +03:00
Vitaliy Filippov 74d8ea2f01 Calculate data crc32c 2019-11-27 02:20:38 +03:00