etcd/Documentation/tuning.md

# Tuning

The default settings in etcd should work well for installations on a local network where the average network latency is low. However, when using etcd across multiple data centers or over networks with high latency, the heartbeat interval and election timeout settings may need tuning.

The network isn't the only source of latency. Each request and response may be impacted by slow disks on both the leader and follower. Each of these timeouts represents the total time from request to successful response from the other machine.

## Time Parameters

The underlying distributed consensus protocol relies on two separate time parameters to ensure that nodes can handoff leadership if one stalls or goes offline.
The first parameter is called the *Heartbeat Interval*.
This is the frequency with which the leader will notify followers that it is still the leader.
For best practices, the parameter should be set around round-trip time between members.
By default, etcd uses a `100ms` heartbeat interval.

The second parameter is the *Election Timeout*.
This timeout is how long a follower node will go without hearing a heartbeat before attempting to become leader itself.
By default, etcd uses a `1000ms` election timeout.

Adjusting these values is a trade off.
The value of heartbeat interval is recommended to be around the maximum of average round-trip time (RTT) between members, normally around 0.5-1.5x the round-trip time.
If heartbeat interval is too low, etcd will send unnecessary messages that increase the usage of CPU and network resources.
On the other side, a too high heartbeat interval leads to high election timeout. Higher election timeout takes longer time to detect a leader failure.
The easiest way to measure round-trip time (RTT) is to use [PING utility][ping].

The election timeout should be set based on the heartbeat interval and average round-trip time between members.
Election timeouts must be at least 10 times the round-trip time so it can account for variance in the network.
For example, if the round-trip time between members is 10ms then the election timeout should be at least 100ms.

The election timeout should be set to at least 5 to 10 times the heartbeat interval to account for variance in leader replication.
For a heartbeat interval of 50ms, set the election timeout to at least 250ms - 500ms.

The upper limit of election timeout is 50000ms (50s), which should only be used when deploying a globally-distributed etcd cluster.
A reasonable round-trip time for the continental United States is 130ms, and the time between US and Japan is around 350-400ms.
If the network has uneven performance or regular packet delays/loss then it is possible that a couple of retries may be necessary to successfully send a packet. So 5s is a safe upper limit of global round-trip time.
As the election timeout should be an order of magnitude bigger than broadcast time, in the case of ~5s for a globally distributed cluster, then 50 seconds becomes a reasonable maximum.

The heartbeat interval and election timeout value should be the same for all members in one cluster. Setting different values for etcd members may disrupt cluster stability.

The default values can be overridden on the command line:

```sh
# Command line arguments:
$ etcd --heartbeat-interval=100 --election-timeout=500

# Environment variables:
$ ETCD_HEARTBEAT_INTERVAL=100 ETCD_ELECTION_TIMEOUT=500 etcd
```

The values are specified in milliseconds.

## Snapshots

etcd appends all key changes to a log file.
This log grows forever and is a complete linear history of every change made to the keys.
A complete history works well for lightly used clusters but clusters that are heavily used would carry around a large log.

To avoid having a huge log etcd makes periodic snapshots.
These snapshots provide a way for etcd to compact the log by saving the current state of the system and removing old logs.

### Snapshot Tuning

Creating snapshots can be expensive so they're only created after a given number of changes to etcd.
By default, snapshots will be made after every 10,000 changes.
If etcd's memory usage and disk usage are too high, try lowering the snapshot threshold by setting the following on the command line:

```sh
# Command line arguments:
$ etcd --snapshot-count=5000

# Environment variables:
$ ETCD_SNAPSHOT_COUNT=5000 etcd
```

[ping]: https://en.wikipedia.org/wiki/Ping_(networking_utility)
Documentation: Fix heading hierarchy. Correct the hierarchy of Markdown symbols in document headings. 2015-10-21 01:26:49 +03:00			`# Tuning`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
doc: eschew "you" for current docs 2016-06-24 03:25:38 +03:00			`The default settings in etcd should work well for installations on a local network where the average network latency is low. However, when using etcd across multiple data centers or over networks with high latency, the heartbeat interval and election timeout settings may need tuning.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
docs: clarify tuning timeouts 2014-08-19 00:15:59 +04:00			`The network isn't the only source of latency. Each request and response may be impacted by slow disks on both the leader and follower. Each of these timeouts represents the total time from request to successful response from the other machine.`

Documentation: Fix heading hierarchy. Correct the hierarchy of Markdown symbols in document headings. 2015-10-21 01:26:49 +03:00			`## Time Parameters`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`The underlying distributed consensus protocol relies on two separate time parameters to ensure that nodes can handoff leadership if one stalls or goes offline.`
			`The first parameter is called the Heartbeat Interval.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00			`This is the frequency with which the leader will notify followers that it is still the leader.`
Documentation: fix typos I found some typos. Please let me know if you have any feedback. Thanks, Documentation: fix metrics.md typo Documentation: trim blank lines in metrics.md 2015-10-09 17:27:03 +03:00			`For best practices, the parameter should be set around round-trip time between members.`
Documentation: Correct defaults for heartbeat and election Defaults for hearbeat-interval and election-timeout is updated according to configuration documentation. 2015-02-06 00:45:55 +03:00			By default, etcd uses a `100ms` heartbeat interval.
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`The second parameter is the Election Timeout.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00			`This timeout is how long a follower node will go without hearing a heartbeat before attempting to become leader itself.`
Documentation: Correct defaults for heartbeat and election Defaults for hearbeat-interval and election-timeout is updated according to configuration documentation. 2015-02-06 00:45:55 +03:00			By default, etcd uses a `1000ms` election timeout.
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
			`Adjusting these values is a trade off.`
docs: document how to set heartbeat interval and election timeout It gives more details about how to set heartbeat interval and election timeout correctly based on RTT. 2015-08-18 08:27:26 +03:00			`The value of heartbeat interval is recommended to be around the maximum of average round-trip time (RTT) between members, normally around 0.5-1.5x the round-trip time.`
			`If heartbeat interval is too low, etcd will send unnecessary messages that increase the usage of CPU and network resources.`
			`On the other side, a too high heartbeat interval leads to high election timeout. Higher election timeout takes longer time to detect a leader failure.`
docs: Relink and fix broken links 2016-02-04 20:36:02 +03:00			`The easiest way to measure round-trip time (RTT) is to use [PING utility][ping].`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
docs: document how to set heartbeat interval and election timeout It gives more details about how to set heartbeat interval and election timeout correctly based on RTT. 2015-08-18 08:27:26 +03:00			`The election timeout should be set based on the heartbeat interval and average round-trip time between members.`
doc: eschew "you" for current docs 2016-06-24 03:25:38 +03:00			`Election timeouts must be at least 10 times the round-trip time so it can account for variance in the network.`
			`For example, if the round-trip time between members is 10ms then the election timeout should be at least 100ms.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
doc: eschew "you" for current docs 2016-06-24 03:25:38 +03:00			`The election timeout should be set to at least 5 to 10 times the heartbeat interval to account for variance in leader replication.`
			`For a heartbeat interval of 50ms, set the election timeout to at least 250ms - 500ms.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
Documentation/tuning: cleanup paragraph on max election - Use one sentence per line for easier diffing - Walkthrough the thought process and cleanup the grammar - Move below the other sections Original author: @philips 2015-09-06 10:38:03 +03:00			`The upper limit of election timeout is 50000ms (50s), which should only be used when deploying a globally-distributed etcd cluster.`
			`A reasonable round-trip time for the continental United States is 130ms, and the time between US and Japan is around 350-400ms.`
doc: eschew "you" for current docs 2016-06-24 03:25:38 +03:00			`If the network has uneven performance or regular packet delays/loss then it is possible that a couple of retries may be necessary to successfully send a packet. So 5s is a safe upper limit of global round-trip time.`
Documentation/tuning: cleanup paragraph on max election - Use one sentence per line for easier diffing - Walkthrough the thought process and cleanup the grammar - Move below the other sections Original author: @philips 2015-09-06 10:38:03 +03:00			`As the election timeout should be an order of magnitude bigger than broadcast time, in the case of ~5s for a globally distributed cluster, then 50 seconds becomes a reasonable maximum.`

docs: document how to set heartbeat interval and election timeout It gives more details about how to set heartbeat interval and election timeout correctly based on RTT. 2015-08-18 08:27:26 +03:00			`The heartbeat interval and election timeout value should be the same for all members in one cluster. Setting different values for etcd members may disrupt cluster stability.`

doc: eschew "you" for current docs 2016-06-24 03:25:38 +03:00			`The default values can be overridden on the command line:`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
			```sh
			`# Command line arguments:`
*: replace '-' with '--' in doc Fix https://github.com/coreos/etcd/issues/4595. 2016-03-21 20:54:27 +03:00			`$ etcd --heartbeat-interval=100 --election-timeout=500`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
			`# Environment variables:`
docs: fix details about 2.0 2015-01-27 22:04:13 +03:00			`$ ETCD_HEARTBEAT_INTERVAL=100 ETCD_ELECTION_TIMEOUT=500 etcd`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00			```

			`The values are specified in milliseconds.`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
Documentation: Fix heading hierarchy. Correct the hierarchy of Markdown symbols in document headings. 2015-10-21 01:26:49 +03:00			`## Snapshots`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`etcd appends all key changes to a log file.`
			`This log grows forever and is a complete linear history of every change made to the keys.`
			`A complete history works well for lightly used clusters but clusters that are heavily used would carry around a large log.`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`To avoid having a huge log etcd makes periodic snapshots.`
			`These snapshots provide a way for etcd to compact the log by saving the current state of the system and removing old logs.`

			`### Snapshot Tuning`

			`Creating snapshots can be expensive so they're only created after a given number of changes to etcd.`
			`By default, snapshots will be made after every 10,000 changes.`
doc: eschew "you" for current docs 2016-06-24 03:25:38 +03:00			`If etcd's memory usage and disk usage are too high, try lowering the snapshot threshold by setting the following on the command line:`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
			```sh
			`# Command line arguments:`
*: replace '-' with '--' in doc Fix https://github.com/coreos/etcd/issues/4595. 2016-03-21 20:54:27 +03:00			`$ etcd --snapshot-count=5000`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
			`# Environment variables:`
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`$ ETCD_SNAPSHOT_COUNT=5000 etcd`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00			```
docs: Relink and fix broken links 2016-02-04 20:36:02 +03:00
			`[ping]: https://en.wikipedia.org/wiki/Ping_(networking_utility)`