Merge pull request #507 from philips/turn-snapshots-on-by-default

feat(*): enable snapshots by default
release-0.4
Brandon Philips 2014-02-05 09:08:43 -08:00
commit 9e43e726a9
5 changed files with 41 additions and 28 deletions

View File

@ -37,7 +37,7 @@ configuration files.
* `-peer-ca-file` - The path of the CAFile. Enables client/peer cert authentication when present. * `-peer-ca-file` - The path of the CAFile. Enables client/peer cert authentication when present.
* `-peer-cert-file` - The cert file of the server. * `-peer-cert-file` - The cert file of the server.
* `-peer-key-file` - The key file of the server. * `-peer-key-file` - The key file of the server.
* `-snapshot` - Open or close snapshot. Defaults to `false`. * `-snapshot=false` - Disable log snapshots. Defaults to `true`.
* `-v` - Enable verbose logging. Defaults to `false`. * `-v` - Enable verbose logging. Defaults to `false`.
* `-vv` - Enable very verbose logging. Defaults to `false`. * `-vv` - Enable very verbose logging. Defaults to `false`.
* `-version` - Print the version and exit. * `-version` - Print the version and exit.

View File

@ -47,30 +47,16 @@ election_timeout = 100
The values are specified in milliseconds. The values are specified in milliseconds.
### Enabling Snapshots ### Snapshots
By default, the Raft protocol appends all etcd changes to a log file. etcd appends all key changes to a log file.
This works well for smaller installations but etcd clusters that are heavily used can see the log grow significantly in size. This log grows forever and is a complete linear history of every change made to the keys.
A complete history works well for lightly used clusters but clusters that are heavily used would carry around a large log.
Snapshots provide a way for etcd to compact the log by saving the current state of the system and removing old logs. To avoid having a huge log etcd makes periodic snapshots.
You can enable snapshotting by adding the following to your command line: These snapshots provide a way for etcd to compact the log by saving the current state of the system and removing old logs.
```sh ### Snapshot Tuning
# Command line arguments:
$ etcd -snapshot
# Environment variables:
$ ETCD_SNAPSHOT=true etcd
```
You can also enable snapshotting within the configuration file:
```toml
snapshot = true
```
### Additional Snapshot Tuning
Creating snapshots can be expensive so they're only created after a given number of changes to etcd. Creating snapshots can be expensive so they're only created after a given number of changes to etcd.
By default, snapshots will be made after every 10,000 changes. By default, snapshots will be made after every 10,000 changes.
@ -78,15 +64,30 @@ If etcd's memory usage and disk usage are too high, you can lower the snapshot t
```sh ```sh
# Command line arguments: # Command line arguments:
$ etcd -snapshot -snapshot-count=5000 $ etcd -snapshot-count=5000
# Environment variables: # Environment variables:
$ ETCD_SNAPSHOT=true ETCD_SNAPSHOT_COUNT=5000 etcd $ ETCD_SNAPSHOT_COUNT=5000 etcd
``` ```
Or you can change the setting in the configuration file: Or you can change the setting in the configuration file:
```toml ```toml
snapshot = true
snapshot_count = 5000 snapshot_count = 5000
``` ```
You can also disable snapshotting by adding the following to your command line:
```sh
# Command line arguments:
$ etcd -snapshot false
# Environment variables:
$ ETCD_SNAPSHOT=false etcd
```
You can also enable snapshotting within the configuration file:
```toml
snapshot = false
```

View File

@ -89,6 +89,7 @@ func NewConfig() *Config {
c.MaxClusterSize = 9 c.MaxClusterSize = 9
c.MaxResultBuffer = 1024 c.MaxResultBuffer = 1024
c.MaxRetryAttempts = 3 c.MaxRetryAttempts = 3
c.Snapshot = true
c.SnapshotCount = 10000 c.SnapshotCount = 10000
c.Peer.Addr = "127.0.0.1:7001" c.Peer.Addr = "127.0.0.1:7001"
c.Peer.HeartbeatTimeout = defaultHeartbeatTimeout c.Peer.HeartbeatTimeout = defaultHeartbeatTimeout

View File

@ -412,14 +412,25 @@ func (s *PeerServer) recordMetricEvent(event raft.Event) {
(*s.metrics).Timer(name).Update(value) (*s.metrics).Timer(name).Update(value)
} }
// logSnapshot logs about the snapshot that was taken.
func (s *PeerServer) logSnapshot(err error, currentIndex, count uint64) {
info := fmt.Sprintf("%s: snapshot of %d events at index %d", s.Config.Name, count, currentIndex)
if err != nil {
log.Infof("%s attempted and failed: %v", info, err)
} else {
log.Infof("%s completed", info)
}
}
func (s *PeerServer) monitorSnapshot() { func (s *PeerServer) monitorSnapshot() {
for { for {
time.Sleep(s.snapConf.checkingInterval) time.Sleep(s.snapConf.checkingInterval)
currentIndex := s.RaftServer().CommitIndex() currentIndex := s.RaftServer().CommitIndex()
count := currentIndex - s.snapConf.lastIndex count := currentIndex - s.snapConf.lastIndex
if uint64(count) > s.snapConf.snapshotThr { if uint64(count) > s.snapConf.snapshotThr {
s.raftServer.TakeSnapshot() err := s.raftServer.TakeSnapshot()
s.logSnapshot(err, currentIndex, count)
s.snapConf.lastIndex = currentIndex s.snapConf.lastIndex = currentIndex
} }
} }

View File

@ -52,7 +52,7 @@ Other Options:
-max-result-buffer Max size of the result buffer. -max-result-buffer Max size of the result buffer.
-max-retry-attempts Number of times a node will try to join a cluster. -max-retry-attempts Number of times a node will try to join a cluster.
-max-cluster-size Maximum number of nodes in the cluster. -max-cluster-size Maximum number of nodes in the cluster.
-snapshot Open or close the snapshot. -snapshot=false Disable log snapshots
-snapshot-count Number of transactions before issuing a snapshot. -snapshot-count Number of transactions before issuing a snapshot.
` `