This happened because flusher_count became dynamic and fsync_batch() was comparing the number
of flushers currently ready to do an fsync with the maximum number of flushers. Also the number
wasn't rechecked on every loop which was also incorrect.
Now the interrupted_rebalance test passes even without IMMEDIATE_COMMIT=1.
- Fix two monitor bugs which led to objects being "logically lost" (physically
present on some secondary OSDs while primary doesn't know about it) after multiple
interrupted rebalancings
- Implement "no_recovery" and "no_rebalance" flags
Fixes incorrect PG history in case when a new rebalance is started
before the finish of the previous one which could make primary OSDs unable
to locate some objects on some secondaries.
- Add heartbeats (fixes failover in case of network issues or offline nodes)
- Fix a bug where a PG could incorrectly become listed as 'incomplete' if historical osd_sets
included a set with the the PG's primary OSD as the only alive one
- Use osd_out_time = 10 minutes by default instead of 30 minutes
- Make monitors stick to a single selected etcd URL on start and not try to select random ones
on every request - this was leading to etcd interaction errors when some etcds were unavailable
- Fix multiple bugs leading to OSDs sometimes being unable to correctly activate PGs
when a lot of PG peering events occurred in a small amount of time
- Fix a bug where OSDs could list incomplete object versions during peering. The bug
manifested with "local rollback operation failed" messages in OSD logs
- Fix a bug where misplaced chunks for degraded and incomplete objects were not removed
from extra OSDs during recovery
- Fix incorrect PG history configuration resulting in OSDs being unable to find some
of the objects after a PG count change
- Simplify block layer write ordering logic
- Avoid extra data move when a lot of OSDs are first stopped for long time and then restarted
- Fix incorrect degraded & misplaced object statistics after a completed rebalance
- Fix incorrect usage of pg_minsize instead of the minimal possible object chunk count in EC pools
This fixes the bug where OSDs were sometimes trying to report updated PG states
infinitely without luck when PGs transitioned from 'starting' to 'peering' too fast
Make syncs wait for all previous writes because it's the only way
to make sure that OSDs do not receive incomplete writes in LIST results
during peering when some writes are still in progress.
Also simplify blockstore submission queue logic.
- Fix operation statistics
- Fix a rebalance hang introduced in 0.5.5
- Test PG count changes with actual data moving
- Fix a possible 'unexpected pg state: 0' error during PG count change
Also don't check for has_degraded because it's not a bug that objects
are _temporarily_ listed as degraded during PG peering as it's not
required for the new primary to connect to _all_ older peers to start
peering. The test may be improved in the future by temporarily disabling
degraded recovery during it and returning the has_degraded check back.
- Transition to CMake build system
- Fix Monitor being unable to change PG sizes
- Fix PG optimizer not using some OSDs in some cases
- Fix inability to change PG count online
- Improve journal flusher performance
- Add a little better systemd unit generator
- Use w=8 with jerasure (breaking change for EC pools)