Useful for deciding when to terminate the unhealthy follower.
If the follower is receiving a leader snapshot, operator may wait.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
This change introduces joint quorums by changing the Node and RawNode
API to accept pb.ConfChangeV2 (on top of pb.ConfChange).
pb.ConfChange continues to work as today: it allows carrying out a
single configuration change. A pb.ConfChange proposal gets added to
the Raft log as such and is thus also observed by the app during Ready
handling, and fed back to ApplyConfChange.
ConfChangeV2 allows joint configuration changes but will continue to
carry out configuration changes in "one phase" (i.e. without ever
entering a joint config) when this is possible.
This is the first (maybe not last) step in cleaning up the bootstrap
code around StartNode.
Initializing a Raft group for the first time is awkward, since a
configuration has to be pulled from thin air. The way this is solved
today is unclean: The app is supposed to pass peers to StartNode(),
we add configuration changes for them to the log, immediately pretend
that they are applied, but actually leave them unapplied (to give the
app a chance to observe them, though if the app did decide to not apply
them things would really go off the rails), and then return control to
the app. The app will then process the initial Readys and as a result
the configuration will be persisted to disk; restarts of the node then
use RestartNode which doesn't take any peers.
The code that did this lived awkwardly in two places fairly deep down
the callstack, though it was really only necessary in StartNode(). This
commit refactors things to make this more obvious: only StartNode does
this dance now. In particular, RawNode does not support this at all any
more; it expects the app to set up its Storage correctly.
Future work may provide helpers to make this "preseeding" of the Storage
more user-friendly. It isn't entirely straightforward to do so since
the Storage interface doesn't provide the right accessors for this
purpose. Briefly speaking, we want to make sure that a non-bootstrapped
node can never catch up via the log so that we can implicitly use one
of the "skipped" log entries to represent the configuration change into
the bootstrap configuration. This is an invasive change that affects
all consumers of raft, and it is of lower urgency since the code (post
this commit) already encapsulates the complexity sufficiently.
Mechanically extract `progressTracker`, `Progress`, and `inflights`
to their own package named `tracker`. Add lots of comments in the
progress, and take the opportunity to rename and clarify various
fields.
Update to Go 1.12.5 testing. Remove deprecated unused and gosimple
pacakges, and mask staticcheck 1006. Also, fix unconvert errors related
to unnecessary type conversions and following staticcheck errors:
- remove redundant return statements
- use for range instead of for select
- use time.Since instead of time.Now().Sub
- omit comparison to bool constant
- replace T.Fatal and T.Fatalf in tests with T.Error and T.Fatalf respectively because the goroutine calls T.Fatal must be called in the same goroutine as the test
- fix error strings that should not be capitalized
- use sort.Strings(...) instead of sort.Sort(sort.StringSlice(...))
- use he status code of Canceled instead of grpc.ErrClientConnClosing which is deprecated
- use use status.Errorf instead of grpc.Errorf which is deprecated
Related #10528#10438
This commit adds a feature for creating a user without password. The
purpose of the feature is reducing attack surface by configuring bad
passwords (CN based auth will be allowed for the user).
The feature can be used with `--no-password` of `etcdctl user add`
command.
Fix https://github.com/coreos/etcd/issues/9590
Update TestMemberPromote to include both learner not-ready and learner
ready test cases.
Removed unit test TestPromoteMember, it requires underlying raft node to
be started and running. The member promote is covered by the integration
test.
1. Maintenance API MoveLeader() returns ErrBadLeaderTransferee if
transferee does not exist or is raft learner.
2. etcdserver TransferLeadership() only choose voting member as
transferee.
Added IsLearner field to etcdserver internal Member type. Routed
learner MemberAdd request from server API to raft. Apply learner
MemberAdd result to server after the request is passed through Raft.
- Added isLearner flag to MemberAddRequest in Cluster API.
- Added isLearner field to StatusResponse in Maintenance API.
- Added MemberPromote rpc to Cluster API.
This PR resolves an issue where the `/metrics` endpoints exposed by the proxy were not returning metrics of the etcd members servers but of the proxy itself.
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Add a bunch of tests on fundamental operation of Create/Set/Delete.
Context: I want to use this package to run the discovery service against
the v3 storage backend. The limited functionality already implemented
should be sufficient to do this.
https://github.com/coreos/discovery.etcd.io/issues/52
When using the embed package to embed etcd, sometimes the storage prefix
being used might be full. In this case, this code path triggers, causing
an: `etcdserver: create wal error: no space left on device` error, which
causes a fatal. A fatal differs from a panic in that it also calls
os.Exit(1). In this situation, the calling program that embeds the etcd
server will be abruptly killed, which prevents it from cleaning up
safely, and giving a proper error message. Depending on what the calling
program is, this can cause corruption and data loss.
This patch switches the fatal to a panic. Ideally this would be a
regular error which would get propagated upwards to the StartEtcd
command, but in the meantime at least this can be caught with recover().
This fixes the most common fatal that I've experienced, but there are
surely more that need looking into. If possible, the errors should be
threaded down into the code path so that embedding etcd can be more
robust.
Fixes: https://github.com/etcd-io/etcd/issues/10588
Once chk(ai) fails with auth.ErrAuthOldRevision it will always do,
regardless how many times you retry. So the error is better be returned
to fail the pending request and make the client re-authenticate.
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.
In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
"read index" doesn't tell much about the root cause.
Most likely, the local follower node is having slow
network, thus timing out waiting to receive read
index response from leader.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Distribution would be:
0.1 second or more
...
25.6 seconds or more
51.2 seconds or more
etcd_network_snapshot_send_success
etcd_network_snapshot_send_failures
etcd_network_snapshot_send_total_duration_seconds
etcd_network_snapshot_receive_success
etcd_network_snapshot_receive_failures
etcd_network_snapshot_receive_total_duration_seconds
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
```
etcd_server_id{server_id="8e9e05c52164694d"} 1
```
Useful for automating membership change operations,
no need to run "endpoint status" or "member list"
command to get member IDs.
Also, useful for "etcd_network" metrics with "To/From" labels.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
etcd server
To improve debuggability of etcd v3. Added a grpc interceptor to log
info on incoming requests to etcd server. The log output includes
remote client info, request content (with value field redacted), request
handling latency, response size, etc. Uses zap logger if available,
otherwise uses capnslog.
Also did some clean up on the chaining of grpc interceptors on server
side.
Fix
=== RUN TestEmbedEtcd
==================
WARNING: DATA RACE
Write at 0x000001df86d0 by goroutine 711:
github.com/coreos/etcd/embed.(*Config).setupLogging.func1()
/go/src/github.com/coreos/etcd/vendor/google.golang.org/grpc/grpclog/loggerv2.go:68 +0x16c
sync.(*Once).Do()
/usr/local/go/src/sync/once.go:44 +0xe1
github.com/coreos/etcd/embed.(*Config).setupLogging()
in gRPC proxy tests.
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
{"level":"warn","ts":1527101858.6985068,"caller":"etcdserver/util.go:115","msg":"apply request took too long","took":0.114101529,"expected-duration":0.1,"prefix":"","request":"header:<ID:1029181977902852337> put:<key:\"\\000\\000...
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
{"level":"warn","ts":1527101858.4149103,"caller":"etcdserver/raft.go:370","msg":"failed to send out heartbeat; took too long, server is overloaded likely from slow disk","heartbeat-interval":0.1,"expected-duration":0.2,"exceeded-duration":0.025771662}
{"level":"warn","ts":1527101858.4149644,"caller":"etcdserver/raft.go:370","msg":"failed to send out heartbeat; took too long, server is overloaded likely from slow disk","heartbeat-interval":0.1,"expected-duration":0.2,"exceeded-duration":0.034015766}
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
When a new file is created under an non existent directly,
the v2 API automatically create the parent directly.
This commit aligns the behaviour of v2v3 emulation to comply with the v2
API.