Christian Beneke
c75ba98f81
Documentation/etcd-mixin: Fix EtcdInsufficientMembers alerting
...
Currently the EtcdInsufficientMembers alert fires, when more than (X/2)-1
instances are unavailable. This fixes it to fire at the correct limit of (X-1)/2
unavailable instances and $value now contains the number of available instances
instead of unavailable ones. Added unit test for EtcdInsufficientMembers alert.
2018-10-15 19:23:43 +02:00
Xiang Li
dac8c6fcc0
Merge pull request #10167 from nvanbenschoten/nvanbenschoten/limitUncommitted
...
raft: provide protection against unbounded Raft log growth
2018-10-13 23:52:28 -07:00
Nathan VanBenschoten
73c20cc1b7
raft: Fix comment on sendHeartbeat
2018-10-14 00:03:43 -04:00
Nathan VanBenschoten
7be7ac5a5d
raft: Fix spelling in doc.go
2018-10-13 23:25:05 -04:00
Nathan VanBenschoten
f89b06dc6d
raft: provide protection against unbounded Raft log growth
...
The suggested pattern for Raft proposals is that they be retried
periodically until they succeed. This turns out to be an issue
when a leader cannot commit entries because the leader will continue
to append re-proposed entries to its log without committing anything.
This can result in the uncommitted tail of a leader's log growing
without bound until it is able to commit entries.
This change add a safeguard to protect against this case where a
leader's log can grow without bound during loss of quorum scenarios.
It does so by introducing a new, optional ``MaxUncommittedEntriesSize
configuration. This config limits the max aggregate size of uncommitted
entries that may be appended to a leader's log. Once this limit
is exceeded, proposals will begin to return ErrProposalDropped
errors.
See cockroachdb/cockroach#27772
2018-10-13 23:25:05 -04:00
Joe Betz
3c6c05be8a
Merge pull request #10176 from jpbetz/keepalive-docs
...
clientv3: Clarify lessor KeepAlive docs
2018-10-12 09:38:51 -07:00
Sam Batschelet
e205d09895
Merge pull request #10171 from paulf69487623/master
...
Documentation: Add the -N option to curl for the watch example to disable buffering
2018-10-12 07:09:14 -04:00
Paul Frieden
b3faeb5d86
Documentation: Add the -N option to curl for the watch example to disable buffering
2018-10-11 22:13:43 -05:00
Wenjia
1cab49ef78
Merge pull request #9718 from kchristidis/fix-snap-pub-error
...
raftexample: Fix publish snapshot error message
2018-10-11 16:45:55 -07:00
Xiang Li
404f7d820c
Merge pull request #10175 from wenjiaswe/fixTestMetricsHealth
...
integration: fix bug in TestMetricsHealth
2018-10-11 16:07:30 -07:00
Joe Betz
49450aaa60
clientv3: Clarify lessor KeepAlive docs
2018-10-11 15:11:28 -07:00
Wenjia Zhang
69f53e1406
integration: fix bug in TestMetricsHealth
2018-10-11 14:55:39 -07:00
Joe Betz
d5c93a7b0b
Merge pull request #10165 from jpbetz/socket-docs
...
Document unix and unixs URL schemes
2018-10-10 15:21:55 -07:00
Gyuho Lee
ef7e9d385b
docs/operate.rst: link latest patch releases
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-10 15:03:51 -07:00
Gyuho Lee
342d53d1b1
docs/metrics: add metrics outputs from patch releases
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-10 15:03:37 -07:00
Gyuho Lee
f0736fe477
CHANGELOG: add Go release versions
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-10 11:07:10 -07:00
Gyuho Lee
5b0960f664
docs/metrics: document missing metrics from master branch
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 18:37:41 -07:00
Gyuho Lee
d4283b895c
CHANGELOG-3.3: update release date for tomorrow
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 18:30:50 -07:00
Gyuho Lee
0f0919c19c
Merge pull request #10159 from gyuho/version-log
...
etcdserver: clear message in cluster version decision
2018-10-09 18:10:14 -07:00
Gyuho Lee
3e37052c08
CHANGELOG: updates for v3.4 and patch releases
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 17:40:51 -07:00
Joe Betz
1957d1cedf
Documentation: Document unix and unixs URL schemes
2018-10-09 14:42:56 -07:00
Gyuho Lee
d2a0f17b82
Merge pull request #10155 from gyuho/metrics-messages
...
rafthttp: probe all raft transports
2018-10-09 11:18:31 -07:00
Gyuho Lee
ba606bf85e
Merge pull request #10156 from gyuho/metrics-health
...
etcdserver: add "etcd_server_health_success/failures"
2018-10-09 00:10:57 -07:00
Joe Betz
ac4754053d
Merge pull request #10160 from etcd-io/jpbetz-patch-1
...
Update patch release list to reflect that 3.1 is maintained
2018-10-08 23:39:35 -07:00
Jingyi Hu
0181609402
Merge pull request #10164 from jingyih/update_CHANGELOG
...
CHANGELOG: update from #10153
2018-10-08 18:47:54 -07:00
Jingyi Hu
4a8693361a
CHANGELOG: update from #10153
2018-10-08 17:15:59 -07:00
Gyuho Lee
90c5968ee1
Merge pull request #10157 from gyuho/go
...
*: use Go 1.11.1 for testing
2018-10-08 16:35:09 -07:00
Gyuho Lee
a3ae8df912
Merge pull request #10112 from gyuho/vendor
...
*: use Go 1.11 module for dependency management, replace "dep"
2018-10-08 16:34:51 -07:00
Gyuho Lee
59dd78dde8
etcdserver: clear message in cluster version decision
...
Only leader can decide cluster version.
Clarify the logging that this local node is the leader.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-08 16:05:10 -07:00
Xiang Li
b046a37256
Merge pull request #10153 from funny-falcon/fix-client-mutex-lock-10111
...
clientv3/concurrency.Mutex.Lock() - preserve invariant
2018-10-08 15:13:52 -07:00
Joe Betz
7a0647ceb7
Documentation: Update patch release list to reflect that 3.1 is maintained
2018-10-08 13:33:07 -07:00
Gyuho Lee
7c33e3d77b
docs/metrics/latest: sync with master
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:52:44 -07:00
Gyuho Lee
d28724a530
travis.yml: update Go version to 1.11.1
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:39:49 -07:00
Gyuho Lee
2a8dc72899
Makefile: update default Go version
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:39:19 -07:00
Gyuho Lee
7524cc6f4c
integration: add "TestMetricsHealth"
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:25:14 -07:00
Gyuho Lee
601d8b4677
etcdserver/api/etcdhttp: remove unused "HandleHealth" function
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:16:18 -07:00
Gyuho Lee
004e04a1d1
etcdserver/api/etcdhttp: add "etcd_server_health_success/failures"
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:15:12 -07:00
Gyuho Lee
884a8bd36b
etcdserver/api/rafthttp: configure "streamProber" in tests
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:32:05 -07:00
Gyuho Lee
7b1ef37054
etcdserver/api/rafthttp: probe all Raft messages' RTT
...
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.
In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:28:54 -07:00
Gyuho Lee
4a239070c8
etcdserver/api/rafthttp: display roundtripper name in warnings
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:14:42 -07:00
Gyuho Lee
47cff4dfe5
etcdserver/api/rafthttp: rename to "pipelineProber"
...
Preliminary work to add prober to "streamRt"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:13:10 -07:00
yura
64e8b2e905
clientv3: concurrency.Mutex.Lock() - preserve invariant
...
Convenient invariant:
- if werr == nil then lock is supposed to be locked at the moment.
While we could not be confident in stronger invariant ('is exactly locked'),
it were inconvenient that previous code could return `werr == nil` after
Mutex.Unlock.
It could happen when ctx is canceled/timeouted exactly after waitDeletes
successfully returned werr == nil and before `<-ctx.Done()` checked.
While such situation is very rare, it is still possible.
fixes #10111
2018-10-05 14:17:32 +03:00
Jingyi Hu
6976819792
Merge pull request #10148 from jingyih/add_unit_test_for_snapshot_file_integrity
...
clientv3: add test for snapshot status
2018-10-03 19:52:19 -07:00
Jingyi Hu
87beb8336f
clientv3: add test for snapshot status
...
Add unit test to check if we can correctly identify a corrupted snapshot
backup file.
2018-10-03 18:17:19 -07:00
Jingyi Hu
2654de8a0e
Merge pull request #10152 from jingyih/add_unfreed_to_goword_whitelist
...
words: whitelist unfreed
2018-10-03 18:16:43 -07:00
Jingyi Hu
57c50b0d8c
words: whitelist unfreed
...
whitelist keyword 'unfreed' for goword. It is from the bbolt error
message.
2018-10-03 18:12:10 -07:00
Gyuho Lee
eca5f03cea
Merge pull request #10149 from jingyih/fix_goword_checking_in_clientv3
...
clientv3: fix goword checking in config.go
2018-10-03 07:42:16 -07:00
Jingyi Hu
7d57ee3427
clientv3: fix goword checking in config.go
2018-10-02 23:02:10 -07:00
Gyuho Lee
bfdfaf5333
words: whitelist PermitWithoutStream
...
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-02 13:53:02 -07:00
Gyuho Lee
1d1f509e98
Merge pull request #10146 from spzala/clientpermitwithoutstream
...
clientv3: let etcd client use all available keepalive ClientParams
2018-10-02 13:31:47 -07:00