Slightly reduces WA. For example, in 4K T1Q128 replicated randwrite tests
WA is reduced from ~3.6 to ~3.1, in T1Q64 from ~3.8 to ~3.4.
Only effective without no_same_sector_overwrites.
This fixes a 'double_alloc' assertion in the following case:
- big_write object #1 v1 to block #100
- big_write object #1 v2 to block #101
- big_write object #2 v1 to block #100
Previously BS_OP_SYNC could take unfinished writes and add them into the journal before
they were actually completed. This was leading to crashes with the message
"BUG: Unexpected dirty_entry 2000000000001:9f2a0000 v3 unstable state during flush: 338"
Make syncs wait for all previous writes because it's the only way
to make sure that OSDs do not receive incomplete writes in LIST results
during peering when some writes are still in progress.
Also simplify blockstore submission queue logic.