etcd/Documentation/tuning.md

## Tuning

The default settings in etcd should work well for installations on a local network where the average network latency is low.
However, when using etcd across multiple data centers or over networks with high latency you may need to tweak the heartbeat interval and election timeout settings.

### Time Parameters

The underlying distributed consensus protocol relies on two separate time parameters to ensure that nodes can handoff leadership if one stalls or goes offline.
The first parameter is called the *Heartbeat Interval*.
This is the frequency with which the leader will notify followers that it is still the leader.
etcd batches commands together for higher throughput so this heartbeat interval is also a delay for how long it takes for commands to be committed.
By default, etcd uses a `50ms` heartbeat interval.

The second parameter is the *Election Timeout*.
This timeout is how long a follower node will go without hearing a heartbeat before attempting to become leader itself.
By default, etcd uses a `200ms` election timeout.

Adjusting these values is a trade off.
Lowering the heartbeat interval will cause individual commands to be committed faster but it will lower the overall throughput of etcd.
If your etcd instances have low utilization then lowering the heartbeat interval can improve your command response time.

The election timeout should be set based on the heartbeat interval and your network ping time between nodes.
Election timeouts should be at least 10 times your ping time so it can account for variance in your network.
For example, if the ping time between your nodes is 10ms then you should have at least a 100ms election timeout.

You should also set your election timeout to at least 4 to 5 times your heartbeat interval to account for variance in leader replication.
For a heartbeat interval of 50ms you should set your election timeout to at least 200ms - 250ms.

You can override the default values on the command line:

```sh
# Command line arguments:
$ etcd -peer-heartbeat-interval=100 -peer-election-timeout=500

# Environment variables:
$ ETCD_PEER_HEARTBEAT_INTERVAL=100 ETCD_PEER_ELECTION_TIMEOUT=500 etcd
```

Or you can set the values within the configuration file:

```toml
[peer]
heartbeat_interval = 100
election_timeout = 500
```

The values are specified in milliseconds.


### Snapshots

etcd appends all key changes to a log file.
This log grows forever and is a complete linear history of every change made to the keys.
A complete history works well for lightly used clusters but clusters that are heavily used would carry around a large log.

To avoid having a huge log etcd makes periodic snapshots.
These snapshots provide a way for etcd to compact the log by saving the current state of the system and removing old logs.

### Snapshot Tuning

Creating snapshots can be expensive so they're only created after a given number of changes to etcd.
By default, snapshots will be made after every 10,000 changes.
If etcd's memory usage and disk usage are too high, you can lower the snapshot threshold by setting the following on the command line:

```sh
# Command line arguments:
$ etcd -snapshot-count=5000

# Environment variables:
$ ETCD_SNAPSHOT_COUNT=5000 etcd
```

Or you can change the setting in the configuration file:

```toml
snapshot_count = 5000
```

You can also disable snapshotting by adding the following to your command line:

```sh
# Command line arguments:
$ etcd -snapshot false

# Environment variables:
$ ETCD_SNAPSHOT=false etcd
```

You can also disable snapshotting within the configuration file:

```toml
snapshot = false
```
fix(Documentation): fixup headers fixup the headers on the api and tuning sections 2014-01-05 05:58:39 +04:00			`## Tuning`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
			`The default settings in etcd should work well for installations on a local network where the average network latency is low.`
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`However, when using etcd across multiple data centers or over networks with high latency you may need to tweak the heartbeat interval and election timeout settings.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`### Time Parameters`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`The underlying distributed consensus protocol relies on two separate time parameters to ensure that nodes can handoff leadership if one stalls or goes offline.`
			`The first parameter is called the Heartbeat Interval.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00			`This is the frequency with which the leader will notify followers that it is still the leader.`
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`etcd batches commands together for higher throughput so this heartbeat interval is also a delay for how long it takes for commands to be committed.`
			By default, etcd uses a `50ms` heartbeat interval.
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`The second parameter is the Election Timeout.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00			`This timeout is how long a follower node will go without hearing a heartbeat before attempting to become leader itself.`
			By default, etcd uses a `200ms` election timeout.

			`Adjusting these values is a trade off.`
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`Lowering the heartbeat interval will cause individual commands to be committed faster but it will lower the overall throughput of etcd.`
			`If your etcd instances have low utilization then lowering the heartbeat interval can improve your command response time.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`The election timeout should be set based on the heartbeat interval and your network ping time between nodes.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00			`Election timeouts should be at least 10 times your ping time so it can account for variance in your network.`
			`For example, if the ping time between your nodes is 10ms then you should have at least a 100ms election timeout.`

chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`You should also set your election timeout to at least 4 to 5 times your heartbeat interval to account for variance in leader replication.`
			`For a heartbeat interval of 50ms you should set your election timeout to at least 200ms - 250ms.`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
			`You can override the default values on the command line:`

			```sh
			`# Command line arguments:`
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`$ etcd -peer-heartbeat-interval=100 -peer-election-timeout=500`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00
			`# Environment variables:`
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`$ ETCD_PEER_HEARTBEAT_INTERVAL=100 ETCD_PEER_ELECTION_TIMEOUT=500 etcd`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00			```

			`Or you can set the values within the configuration file:`

			```toml
			`[peer]`
chore: rename 'heartbeat timeout' to 'heartbeat interval' Heartbeat timeout means the period length that indicates heartbeat is out of service, which is different from heartbeat interval. So we should use '-peer-heartbeat-interval' instead of '-peer-heartbeat-timeout' in etcd. '-peer-heartbeat-timeout' is deprecated but still could be used. 2014-02-17 08:19:31 +04:00			`heartbeat_interval = 100`
Documentation: fix timeout tuning TOML example this makes the TOML example for tuning the peer election timeout consistent with the guide's advice and with the other examples. 2014-08-11 06:55:44 +04:00			`election_timeout = 500`
feat(README): splitup the sections into individual files The README is getting rather large so split it into individual files. The next step will be rendering these into HTML pages with a TOC so that they are a bit more navigable. What do people think of this? 2014-01-04 01:45:03 +04:00			```

			`The values are specified in milliseconds.`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00

feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`### Snapshots`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`etcd appends all key changes to a log file.`
			`This log grows forever and is a complete linear history of every change made to the keys.`
			`A complete history works well for lightly used clusters but clusters that are heavily used would carry around a large log.`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`To avoid having a huge log etcd makes periodic snapshots.`
			`These snapshots provide a way for etcd to compact the log by saving the current state of the system and removing old logs.`

			`### Snapshot Tuning`

			`Creating snapshots can be expensive so they're only created after a given number of changes to etcd.`
			`By default, snapshots will be made after every 10,000 changes.`
			`If etcd's memory usage and disk usage are too high, you can lower the snapshot threshold by setting the following on the command line:`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
			```sh
			`# Command line arguments:`
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`$ etcd -snapshot-count=5000`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
			`# Environment variables:`
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`$ ETCD_SNAPSHOT_COUNT=5000 etcd`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00			```

feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`Or you can change the setting in the configuration file:`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
			```toml
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`snapshot_count = 5000`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00			```

feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`You can also disable snapshotting by adding the following to your command line:`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
			```sh
			`# Command line arguments:`
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`$ etcd -snapshot false`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
			`# Environment variables:`
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`$ ETCD_SNAPSHOT=false etcd`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00			```

fix(Documentation/tuning): fix incorrect comment about snapshot 2014-02-14 01:01:19 +04:00			`You can also disable snapshotting within the configuration file:`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00
			```toml
feat(*): enable snapshots by default Ben recently added test coverage for snapshots so we should enable it in etcd. Lets do this. https://github.com/goraft/raft/commit/1d66f6a111793e66877d4ce8427c2e2e6b9e29df 2014-01-24 08:53:10 +04:00			`snapshot = false`
Add snapshot documentation. 2014-01-23 02:39:08 +04:00			```