"2-phase" (write->stabilize) process is pointless for deletions because it
doesn't protect us from incomplete objects. This happens because it removes
the version information from metadata after stabilization. Deletions require
"3-phase" process with a potentially very long 3rd phase.
So, deletions will be allowed to generate degraded and incomplete objects,
and for it to not affect users' ability to delete something, the cluster
will allow to delete whole inodes while storing a list of them in etcd.
Proper TRIM will be impossible until the implementation of the aforementioned
"3-phase" process, though.
By the way, this change also fixes a possible write stall after rebalancing
which was caused by the lack of "stabilize delete" operations.
The stall occurred during fio Q=128 random write tests with low flusher_count (4).
It was caused by flushers being unable to flush the beginning of the journal
because it contained older writes to an object that also had writes in the very end
of the journal, after dirty_start.
- fix the `delete op` inside lambda callback crash (it frees the lambda itself
which results in use-after-free with g++)
- fix stop_client() reenterability
- fix a bug in the blockstore layer which resulted in always returning version=0
for zero-length reads
- change error codes for blockstore_stabilize
It doesn't reduce actual WA, but it reduces tail latency (Q=32, 10% / 50% / 90% / 99% / 99.95%):
- write: 766us/979us/1090us/1303us/1729us vs 1074us/1450us/2212us/3261us/4113us
- sync: 701us/881us/1188us/1762us/2540us vs 269us/955us/1663us/2638us/4146us