CAUTION! This version is not fool proof yet. If you purge data of an OSD by
overwriting the disk with zeroes and restart it then the same data will also
be removed from other replicas :-).
I plan to add protection from this situation before merging it into master.
The idea is to make each OSD store a random "cookie" on disk and remove itself
from history automatically if the cookie doesn't match.
Sync before listing was added to wait for all PG writes possibly left in queue
from the previous master to finish before listing it
But in fact it may block the cluster when EC is used and some unstable writes
are left in the queue - they block journal flushing, rollback/stabilize is
required to unblock them, but rollback/stabilize may only happen after PG is
peered. But peering needs listings, listings are requested only after sync, and
sync itself waits for currently blocked writes waiting in the queue
Required to prevent data loss due to activation of an OSD with older data
when PG OSD set change doesn't occur. I.e. fixes the simplest case:
- Run 2 OSDs with 1 PG
- Start writing into the PG
- Stop OSD 2
- Stop OSD 1
- Start OSD 2
After this change the PG will refuse to start after the last step.
Previously OSDs could commit deletes before writes during recovery or rebalance
in the "lazy fsync" (immediate_commit=off) mode which could result in lost objects