Commit Graph

14745 Commits (b42b39446bc1b563ba58aceda53b6ecad87d73f9)

Author SHA1 Message Date
Tobias Schottdorf b42b39446b
Merge pull request #10199 from tschottdorf/fix-max-uncommitted-size
raft: fix bug in unbounded log growth prevention mechanism
2018-10-22 21:30:01 +02:00
Xiang Li a27a73e448
Merge pull request #10193 from johncming/master
mvcc/backend: code format optimization
2018-10-22 11:50:53 -07:00
Tobias Schottdorf ad49c8fd98 raft: fix bug in unbounded log growth prevention mechanism
The previous code was using the proto-generated `Size()` method to
track the size of an incoming proposal at the leader. This includes
the Index and Term, which were mutated after the call to `Size()`
when appending to the log. Additionally, it was not taking into
account that an ignored configuration change would ignore the
original proposal and append an empty entry instead.

As a result, a fully committed Raft group could end up with a non-
zero tracked uncommitted Raft log counter that would eventually hit
the ceiling and drop all future proposals indiscriminately. It would
also immediately imply that proposals exceeding the threshold alone
would get refused (as the "first uncommitted proposal" gets special
treatment and is always allowed in).

Track only the size of the payload actually appended to the Raft log
instead.

For context, see:
https://github.com/cockroachdb/cockroach/issues/31618#issuecomment-431374938
2018-10-22 11:28:39 +02:00
Gyuho Lee 8c80efb886 CHANGELOG: highlight minimum recommended version, change github org URLs
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-18 16:39:01 -07:00
Gyuho Lee 88e0830560 CHANGELOG-3.4: add "raft.Config.MaxUncommittedEntriesSize" change
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-18 11:47:50 -07:00
caoming cf309757d6 mvcc/backend: code format optimization 2018-10-17 14:18:09 +08:00
Xiang Li 7a759c18d2
Merge pull request #10178 from johncming/master
bugfix: use the backend create by snapshot instead of origin one in tests
2018-10-15 10:54:34 -07:00
Gyuho Lee 1ded5aaf4d
Merge pull request #10150 from cbeneke/fix/mixin-insufficent-member-alert
etcd-mixin: Fix EtcdInsufficientMembers alerting
2018-10-15 10:27:59 -07:00
Christian Beneke c75ba98f81 Documentation/etcd-mixin: Fix EtcdInsufficientMembers alerting
Currently the EtcdInsufficientMembers alert fires, when more than (X/2)-1
instances are unavailable. This fixes it to fire at the correct limit of (X-1)/2
unavailable instances and $value now contains the number of available instances
instead of unavailable ones. Added unit test for EtcdInsufficientMembers alert.
2018-10-15 19:23:43 +02:00
caoming bf49b9a145 mvcc/backend: fix to use the backend create by snapshot instead of origin one. 2018-10-15 09:35:20 +08:00
Xiang Li dac8c6fcc0
Merge pull request #10167 from nvanbenschoten/nvanbenschoten/limitUncommitted
raft: provide protection against unbounded Raft log growth
2018-10-13 23:52:28 -07:00
Nathan VanBenschoten 73c20cc1b7 raft: Fix comment on sendHeartbeat 2018-10-14 00:03:43 -04:00
Nathan VanBenschoten 7be7ac5a5d raft: Fix spelling in doc.go 2018-10-13 23:25:05 -04:00
Nathan VanBenschoten f89b06dc6d raft: provide protection against unbounded Raft log growth
The suggested pattern for Raft proposals is that they be retried
periodically until they succeed. This turns out to be an issue
when a leader cannot commit entries because the leader will continue
to append re-proposed entries to its log without committing anything.
This can result in the uncommitted tail of a leader's log growing
without bound until it is able to commit entries.

This change add a safeguard to protect against this case where a
leader's log can grow without bound during loss of quorum scenarios.
It does so by introducing a new, optional ``MaxUncommittedEntriesSize
configuration. This config limits the max aggregate size of uncommitted
entries that may be appended to a leader's log. Once this limit
is exceeded, proposals will begin to return ErrProposalDropped
errors.

See cockroachdb/cockroach#27772
2018-10-13 23:25:05 -04:00
Joe Betz 3c6c05be8a
Merge pull request #10176 from jpbetz/keepalive-docs
clientv3: Clarify lessor KeepAlive docs
2018-10-12 09:38:51 -07:00
Sam Batschelet e205d09895
Merge pull request #10171 from paulf69487623/master
Documentation: Add the -N option to curl for the watch example to disable buffering
2018-10-12 07:09:14 -04:00
Paul Frieden b3faeb5d86 Documentation: Add the -N option to curl for the watch example to disable buffering 2018-10-11 22:13:43 -05:00
Wenjia 1cab49ef78
Merge pull request #9718 from kchristidis/fix-snap-pub-error
raftexample: Fix publish snapshot error message
2018-10-11 16:45:55 -07:00
Xiang Li 404f7d820c
Merge pull request #10175 from wenjiaswe/fixTestMetricsHealth
integration: fix bug in TestMetricsHealth
2018-10-11 16:07:30 -07:00
Joe Betz 49450aaa60
clientv3: Clarify lessor KeepAlive docs 2018-10-11 15:11:28 -07:00
Wenjia Zhang 69f53e1406 integration: fix bug in TestMetricsHealth 2018-10-11 14:55:39 -07:00
Joe Betz d5c93a7b0b
Merge pull request #10165 from jpbetz/socket-docs
Document unix and unixs URL schemes
2018-10-10 15:21:55 -07:00
Gyuho Lee ef7e9d385b docs/operate.rst: link latest patch releases
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-10 15:03:51 -07:00
Gyuho Lee 342d53d1b1 docs/metrics: add metrics outputs from patch releases
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-10 15:03:37 -07:00
Gyuho Lee f0736fe477 CHANGELOG: add Go release versions
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-10 11:07:10 -07:00
Gyuho Lee 5b0960f664 docs/metrics: document missing metrics from master branch
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 18:37:41 -07:00
Gyuho Lee d4283b895c CHANGELOG-3.3: update release date for tomorrow
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 18:30:50 -07:00
Gyuho Lee 0f0919c19c
Merge pull request #10159 from gyuho/version-log
etcdserver: clear message in cluster version decision
2018-10-09 18:10:14 -07:00
Gyuho Lee 3e37052c08 CHANGELOG: updates for v3.4 and patch releases
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-09 17:40:51 -07:00
Joe Betz 1957d1cedf
Documentation: Document unix and unixs URL schemes 2018-10-09 14:42:56 -07:00
Gyuho Lee d2a0f17b82
Merge pull request #10155 from gyuho/metrics-messages
rafthttp: probe all raft transports
2018-10-09 11:18:31 -07:00
Gyuho Lee ba606bf85e
Merge pull request #10156 from gyuho/metrics-health
etcdserver: add "etcd_server_health_success/failures"
2018-10-09 00:10:57 -07:00
Joe Betz ac4754053d
Merge pull request #10160 from etcd-io/jpbetz-patch-1
Update patch release list to reflect that 3.1 is maintained
2018-10-08 23:39:35 -07:00
Jingyi Hu 0181609402
Merge pull request #10164 from jingyih/update_CHANGELOG
CHANGELOG: update from #10153
2018-10-08 18:47:54 -07:00
Jingyi Hu 4a8693361a CHANGELOG: update from #10153 2018-10-08 17:15:59 -07:00
Gyuho Lee 90c5968ee1
Merge pull request #10157 from gyuho/go
*: use Go 1.11.1 for testing
2018-10-08 16:35:09 -07:00
Gyuho Lee a3ae8df912
Merge pull request #10112 from gyuho/vendor
*: use Go 1.11 module for dependency management, replace "dep"
2018-10-08 16:34:51 -07:00
Gyuho Lee 59dd78dde8 etcdserver: clear message in cluster version decision
Only leader can decide cluster version.
Clarify the logging that this local node is the leader.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-08 16:05:10 -07:00
Xiang Li b046a37256
Merge pull request #10153 from funny-falcon/fix-client-mutex-lock-10111
clientv3/concurrency.Mutex.Lock() - preserve invariant
2018-10-08 15:13:52 -07:00
Joe Betz 7a0647ceb7
Documentation: Update patch release list to reflect that 3.1 is maintained 2018-10-08 13:33:07 -07:00
Gyuho Lee 7c33e3d77b docs/metrics/latest: sync with master
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:52:44 -07:00
Gyuho Lee d28724a530 travis.yml: update Go version to 1.11.1
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:39:49 -07:00
Gyuho Lee 2a8dc72899 Makefile: update default Go version
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:39:19 -07:00
Gyuho Lee 7524cc6f4c integration: add "TestMetricsHealth"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:25:14 -07:00
Gyuho Lee 601d8b4677 etcdserver/api/etcdhttp: remove unused "HandleHealth" function
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:16:18 -07:00
Gyuho Lee 004e04a1d1 etcdserver/api/etcdhttp: add "etcd_server_health_success/failures"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:15:12 -07:00
Gyuho Lee 884a8bd36b etcdserver/api/rafthttp: configure "streamProber" in tests
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:32:05 -07:00
Gyuho Lee 7b1ef37054 etcdserver/api/rafthttp: probe all Raft messages' RTT
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.

In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:28:54 -07:00
Gyuho Lee 4a239070c8 etcdserver/api/rafthttp: display roundtripper name in warnings
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:14:42 -07:00
Gyuho Lee 47cff4dfe5 etcdserver/api/rafthttp: rename to "pipelineProber"
Preliminary work to add prober to "streamRt"

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:13:10 -07:00