Commit Graph

14736 Commits (c75ba98f8167a131818a9306459cacff1d059f69)

Author SHA1 Message Date
Christian Beneke c75ba98f81 Documentation/etcd-mixin: Fix EtcdInsufficientMembers alerting
Currently the EtcdInsufficientMembers alert fires, when more than (X/2)-1
instances are unavailable. This fixes it to fire at the correct limit of (X-1)/2
unavailable instances and $value now contains the number of available instances
instead of unavailable ones. Added unit test for EtcdInsufficientMembers alert.
2018-10-15 19:23:43 +02:00
Xiang Li dac8c6fcc0
Merge pull request #10167 from nvanbenschoten/nvanbenschoten/limitUncommitted
raft: provide protection against unbounded Raft log growth
2018-10-13 23:52:28 -07:00
Nathan VanBenschoten 73c20cc1b7 raft: Fix comment on sendHeartbeat 2018-10-14 00:03:43 -04:00
Nathan VanBenschoten 7be7ac5a5d raft: Fix spelling in doc.go 2018-10-13 23:25:05 -04:00
Nathan VanBenschoten f89b06dc6d raft: provide protection against unbounded Raft log growth
The suggested pattern for Raft proposals is that they be retried
periodically until they succeed. This turns out to be an issue
when a leader cannot commit entries because the leader will continue
to append re-proposed entries to its log without committing anything.
This can result in the uncommitted tail of a leader's log growing
without bound until it is able to commit entries.

This change add a safeguard to protect against this case where a
leader's log can grow without bound during loss of quorum scenarios.
It does so by introducing a new, optional ``MaxUncommittedEntriesSize
configuration. This config limits the max aggregate size of uncommitted
entries that may be appended to a leader's log. Once this limit
is exceeded, proposals will begin to return ErrProposalDropped
errors.

See cockroachdb/cockroach#27772
2018-10-13 23:25:05 -04:00
Joe Betz 3c6c05be8a
Merge pull request #10176 from jpbetz/keepalive-docs
clientv3: Clarify lessor KeepAlive docs
2018-10-12 09:38:51 -07:00
Sam Batschelet e205d09895
Merge pull request #10171 from paulf69487623/master
Documentation: Add the -N option to curl for the watch example to disable buffering
2018-10-12 07:09:14 -04:00
Paul Frieden b3faeb5d86 Documentation: Add the -N option to curl for the watch example to disable buffering 2018-10-11 22:13:43 -05:00
Wenjia 1cab49ef78
Merge pull request #9718 from kchristidis/fix-snap-pub-error
raftexample: Fix publish snapshot error message
2018-10-11 16:45:55 -07:00
Xiang Li 404f7d820c
Merge pull request #10175 from wenjiaswe/fixTestMetricsHealth
integration: fix bug in TestMetricsHealth
2018-10-11 16:07:30 -07:00
Joe Betz 49450aaa60
clientv3: Clarify lessor KeepAlive docs 2018-10-11 15:11:28 -07:00
Wenjia Zhang 69f53e1406 integration: fix bug in TestMetricsHealth 2018-10-11 14:55:39 -07:00
Joe Betz d5c93a7b0b
Merge pull request #10165 from jpbetz/socket-docs
Document unix and unixs URL schemes
2018-10-10 15:21:55 -07:00
Gyuho Lee ef7e9d385b docs/operate.rst: link latest patch releases
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-10 15:03:51 -07:00
Gyuho Lee 342d53d1b1 docs/metrics: add metrics outputs from patch releases
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-10 15:03:37 -07:00
Gyuho Lee f0736fe477 CHANGELOG: add Go release versions
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-10 11:07:10 -07:00
Gyuho Lee 5b0960f664 docs/metrics: document missing metrics from master branch
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 18:37:41 -07:00
Gyuho Lee d4283b895c CHANGELOG-3.3: update release date for tomorrow
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 18:30:50 -07:00
Gyuho Lee 0f0919c19c
Merge pull request #10159 from gyuho/version-log
etcdserver: clear message in cluster version decision
2018-10-09 18:10:14 -07:00
Gyuho Lee 3e37052c08 CHANGELOG: updates for v3.4 and patch releases
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 17:40:51 -07:00
Joe Betz 1957d1cedf
Documentation: Document unix and unixs URL schemes 2018-10-09 14:42:56 -07:00
Gyuho Lee d2a0f17b82
Merge pull request #10155 from gyuho/metrics-messages
rafthttp: probe all raft transports
2018-10-09 11:18:31 -07:00
Gyuho Lee ba606bf85e
Merge pull request #10156 from gyuho/metrics-health
etcdserver: add "etcd_server_health_success/failures"
2018-10-09 00:10:57 -07:00
Joe Betz ac4754053d
Merge pull request #10160 from etcd-io/jpbetz-patch-1
Update patch release list to reflect that 3.1 is maintained
2018-10-08 23:39:35 -07:00
Jingyi Hu 0181609402
Merge pull request #10164 from jingyih/update_CHANGELOG
CHANGELOG: update from #10153
2018-10-08 18:47:54 -07:00
Jingyi Hu 4a8693361a CHANGELOG: update from #10153 2018-10-08 17:15:59 -07:00
Gyuho Lee 90c5968ee1
Merge pull request #10157 from gyuho/go
*: use Go 1.11.1 for testing
2018-10-08 16:35:09 -07:00
Gyuho Lee a3ae8df912
Merge pull request #10112 from gyuho/vendor
*: use Go 1.11 module for dependency management, replace "dep"
2018-10-08 16:34:51 -07:00
Gyuho Lee 59dd78dde8 etcdserver: clear message in cluster version decision
Only leader can decide cluster version.
Clarify the logging that this local node is the leader.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-08 16:05:10 -07:00
Xiang Li b046a37256
Merge pull request #10153 from funny-falcon/fix-client-mutex-lock-10111
clientv3/concurrency.Mutex.Lock() - preserve invariant
2018-10-08 15:13:52 -07:00
Joe Betz 7a0647ceb7
Documentation: Update patch release list to reflect that 3.1 is maintained 2018-10-08 13:33:07 -07:00
Gyuho Lee 7c33e3d77b docs/metrics/latest: sync with master
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:52:44 -07:00
Gyuho Lee d28724a530 travis.yml: update Go version to 1.11.1
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:39:49 -07:00
Gyuho Lee 2a8dc72899 Makefile: update default Go version
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:39:19 -07:00
Gyuho Lee 7524cc6f4c integration: add "TestMetricsHealth"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:25:14 -07:00
Gyuho Lee 601d8b4677 etcdserver/api/etcdhttp: remove unused "HandleHealth" function
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:16:18 -07:00
Gyuho Lee 004e04a1d1 etcdserver/api/etcdhttp: add "etcd_server_health_success/failures"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:15:12 -07:00
Gyuho Lee 884a8bd36b etcdserver/api/rafthttp: configure "streamProber" in tests
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:32:05 -07:00
Gyuho Lee 7b1ef37054 etcdserver/api/rafthttp: probe all Raft messages' RTT
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.

In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:28:54 -07:00
Gyuho Lee 4a239070c8 etcdserver/api/rafthttp: display roundtripper name in warnings
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:14:42 -07:00
Gyuho Lee 47cff4dfe5 etcdserver/api/rafthttp: rename to "pipelineProber"
Preliminary work to add prober to "streamRt"

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:13:10 -07:00
yura 64e8b2e905 clientv3: concurrency.Mutex.Lock() - preserve invariant
Convenient invariant:
- if werr == nil then lock is supposed to be locked at the moment.

While we could not be confident in stronger invariant ('is exactly locked'),
it were inconvenient that previous code could return `werr == nil` after
Mutex.Unlock.

It could happen when ctx is canceled/timeouted exactly after waitDeletes
successfully returned werr == nil and before `<-ctx.Done()` checked.
While such situation is very rare, it is still possible.

fixes #10111
2018-10-05 14:17:32 +03:00
Jingyi Hu 6976819792
Merge pull request #10148 from jingyih/add_unit_test_for_snapshot_file_integrity
clientv3: add test for snapshot status
2018-10-03 19:52:19 -07:00
Jingyi Hu 87beb8336f clientv3: add test for snapshot status
Add unit test to check if we can correctly identify a corrupted snapshot
backup file.
2018-10-03 18:17:19 -07:00
Jingyi Hu 2654de8a0e
Merge pull request #10152 from jingyih/add_unfreed_to_goword_whitelist
words: whitelist unfreed
2018-10-03 18:16:43 -07:00
Jingyi Hu 57c50b0d8c words: whitelist unfreed
whitelist keyword 'unfreed' for goword. It is from the bbolt error
message.
2018-10-03 18:12:10 -07:00
Gyuho Lee eca5f03cea
Merge pull request #10149 from jingyih/fix_goword_checking_in_clientv3
clientv3: fix goword checking in config.go
2018-10-03 07:42:16 -07:00
Jingyi Hu 7d57ee3427 clientv3: fix goword checking in config.go 2018-10-02 23:02:10 -07:00
Gyuho Lee bfdfaf5333 words: whitelist PermitWithoutStream
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-02 13:53:02 -07:00
Gyuho Lee 1d1f509e98
Merge pull request #10146 from spzala/clientpermitwithoutstream
clientv3: let etcd client use all available keepalive ClientParams
2018-10-02 13:31:47 -07:00