Merge pull request #7622 from heyitsanthony/faq-disk-leader

Documentation: add disk latency leader loss question to FAQ
release-3.2
Xiang Li 2017-03-28 19:18:50 -07:00 committed by GitHub
commit 36735d52a4
1 changed files with 4 additions and 0 deletions

View File

@ -78,6 +78,10 @@ On the other hand, if the downed member is removed from cluster membership first
etcd sets `strict-reconfig-check` in order to reject reconfiguration requests that would cause quorum loss. Abandoning quorum is really risky (especially when the cluster is already unhealthy). Although it may be tempting to disable quorum checking if there's quorum loss to add a new member, this could lead to full fledged cluster inconsistency. For many applications, this will make the problem even worse ("disk geometry corruption" being a candidate for most terrifying).
### Why does etcd lose its leader from disk latency spikes?
This is intentional; disk latency is part of leader liveness. Suppose the cluster leader takes a minute to fsync a raft log update to disk, but the etcd cluster has a one second election timeout. Even though the leader can process network messages within the election interval (e.g., send heartbeats), it's effectively unavailable because it can't commit any new proposals; it's waiting on the slow disk. If the cluster frequently loses its leader due to disk latencies, try [tuning][tuning] the disk settings or etcd time parameters.
### Performance
#### How should I benchmark etcd?