Previously BS_OP_SYNC could take unfinished writes and add them into the journal before
they were actually completed. This was leading to crashes with the message
"BUG: Unexpected dirty_entry 2000000000001:9f2a0000 v3 unstable state during flush: 338"
- Write operations could be 'stabilized' and previous versions could be
purged from OSDs before the removal of version_override and following
reads could potentially hit different version in EC pools
- Object was marked clean after completing the delete during recovery, so
reads could in theory hit a deleted version and return nothing
The version seems to be stable after this bunch of fixes :)
- Fix delete & write operation ordering during rebalance to not lose objects in the immediate_commit=off mode
- Fix a possible crash caused by very high iodepths
- Re-distribute PG primaries over OSDs that come up after a short downtime
- Allow to specify etcd URLs for OSDs with http://, do not die with a strange error if -etcd option is missing for fio
- Fix a journal flushing deadlock which sometimes occurred in the immediate_commit=off mode
- Fix a bug where OSDs could hang if the data device filled up
- Fix an allocator bug where it was unable to allocate up to last (n%64) data device blocks
- Fix monitor crash that occurred on removal of some etcd keys
- Fix a bug where PGs could remain incomplete due to incorrect PG history with just zeroes in osd_sets
Previously OSDs could commit deletes before writes during recovery or rebalance
in the "lazy fsync" (immediate_commit=off) mode which could result in lost objects
This happened because flusher_count became dynamic and fsync_batch() was comparing the number
of flushers currently ready to do an fsync with the maximum number of flushers. Also the number
wasn't rechecked on every loop which was also incorrect.
Now the interrupted_rebalance test passes even without IMMEDIATE_COMMIT=1.
- Fix two monitor bugs which led to objects being "logically lost" (physically
present on some secondary OSDs while primary doesn't know about it) after multiple
interrupted rebalancings
- Implement "no_recovery" and "no_rebalance" flags
Fixes incorrect PG history in case when a new rebalance is started
before the finish of the previous one which could make primary OSDs unable
to locate some objects on some secondaries.
- Add heartbeats (fixes failover in case of network issues or offline nodes)
- Fix a bug where a PG could incorrectly become listed as 'incomplete' if historical osd_sets
included a set with the the PG's primary OSD as the only alive one
- Use osd_out_time = 10 minutes by default instead of 30 minutes
- Make monitors stick to a single selected etcd URL on start and not try to select random ones
on every request - this was leading to etcd interaction errors when some etcds were unavailable