Fix https://github.com/coreos/etcd/issues/7865.
It is also possible to have mis-matched key file
while renaming directories.
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Connection pausing added another exit condition in the listener
path, causing the bridge to leak connections instead of closing
them when signalled to close. Also adds some additional Close
paranoia.
Fixes#7823
Weaken TestV3ElectionObserve so it only checks that it observes a strictly
monotonically ascending leader transition sequence following the first
observed leader. First, the Observe will issue the leader channel before
getting a response for its first get; the election revision is only bound
after returning the channel. So, Observe can't be expected to always
return the leader at the time it was started. Second, Observe fetches
the current leader based on its create revision, but begins watching on its
ModRevision; this is important so that elections still work in case the
leader issues proclamations following a compaction that exceeds its
creation revision. So, Observe can't be expected to return the entire
proclamation sequence for a single leader.
Fixes#7749
This commit adds jwt token support in v3 auth API.
Remaining major ToDos:
- Currently token type isn't hidden from etcdserver. In the near
future the information should be completely invisible from
etcdserver package.
- Configurable expiration of token. Currently tokens can be valid
until keys are changed.
How to use:
1. generate keys for signing and verfying jwt tokens:
$ openssl genrsa -out app.rsa 1024
$ openssl rsa -in app.rsa -pubout > app.rsa.pub
2. add command line options to etcd like below:
--auth-token-type jwt \
--auth-jwt-pub-key app.rsa.pub --auth-jwt-priv-key app.rsa \
--auth-jwt-sign-method RS512
3. launch etcd cluster
Below is a performance comparison of serializable read w/ and w/o jwt
token. Every (3) etcd node is executed on a single machine. Signing
method is RS512 and key length is 1024 bit. As the results show, jwt
based token introduces a performance overhead but it would be
acceptable for a case that requires authentication.
w/o jwt token auth (no auth):
Summary:
Total: 1.6172 secs.
Slowest: 0.0125 secs.
Fastest: 0.0001 secs.
Average: 0.0002 secs.
Stddev: 0.0004 secs.
Requests/sec: 6183.5877
Response time histogram:
0.000 [1] |
0.001 [9982] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.003 [1] |
0.004 [1] |
0.005 [0] |
0.006 [0] |
0.008 [6] |
0.009 [0] |
0.010 [1] |
0.011 [5] |
0.013 [3] |
Latency distribution:
10% in 0.0001 secs.
25% in 0.0001 secs.
50% in 0.0001 secs.
75% in 0.0001 secs.
90% in 0.0002 secs.
95% in 0.0002 secs.
99% in 0.0003 secs.
w/ jwt token auth:
Summary:
Total: 2.5364 secs.
Slowest: 0.0182 secs.
Fastest: 0.0002 secs.
Average: 0.0003 secs.
Stddev: 0.0005 secs.
Requests/sec: 3942.5185
Response time histogram:
0.000 [1] |
0.002 [9975] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.004 [0] |
0.006 [1] |
0.007 [11] |
0.009 [2] |
0.011 [4] |
0.013 [5] |
0.015 [0] |
0.016 [0] |
0.018 [1] |
Latency distribution:
10% in 0.0002 secs.
25% in 0.0002 secs.
50% in 0.0002 secs.
75% in 0.0002 secs.
90% in 0.0003 secs.
95% in 0.0003 secs.
99% in 0.0004 secs.
The issue is caused by leader loss even after waitLeader() returns
which can happen if the test machine is flaky which triggers a leader loss
or the killed node is the leader since waitLeader() only scans followers in
TestRestartMember() and they can have the same older leader.
In those cases, clusterMustProgress() proceeds with no leader which triggers
the no leader error.
To get around that, use linearizable get in waitLeader() to ensure leader is up
and retries on kapi.create() in clusterMustProgress() to ensure it proceeds with
a leader.
FIX#7258
Getting gosimple suggestion while running test script, so this PR is for fixing gosimple S1019 check.
raft/node_test.go:456:40: should use make([]raftpb.Entry, 1) instead (S1019)
raft/node_test.go:457:49: should use make([]raftpb.Entry, 1) instead (S1019)
raft/node_test.go:458:43: should use make([]raftpb.Message, 1) instead (S1019)
Refer https://github.com/dominikh/go-tools/blob/master/cmd/gosimple/README.md#checks for more information.
This change is needed to handle process restarts with elections. When the
leader process is restarted, it should be able to hang on to the leadership
by using the existing lease.
Fixes#7166
Common pattern was defer cancel(), but clus.Terminate() at the end of
the test. This appears to lead to a deadlock that is only released
once the context times out, causing inflated test times.
suppose a lease granting request from a follower goes through and followed by a lease look up or renewal, the leader might not apply the lease grant request locally. So the leader might not find the lease from the lease look up or renewal request which will result lease not found error. To fix this issue, we force the leader to apply its pending commited index before looking up lease.
FIX#6978
Giving Renew() the default request timeout causes TestV3LeaseFailover
to miss its timing constraints. Since it only needs to wait until the
leader recognizes the leader is lost, use RequireLeader to cancel the
keepalive stream before the request times out.
Proxy client layer ignores call options so Put is always FailFast;
this can lead to connection errors when trying to issue the Put
following restarting the client's target server.
Use the system page size to set the test quota size. Also, change
a comment related to setting the node quota to be more clear.
Signed-off-by: Geoff Levand <geoff@infradead.org>
When the non Leader etcd server receives a LeaseTimeToLive on a nonexistent lease, it responds with a nil resp and a nil error The invoking function parses the nil resp and results a segmentation fault.
I fix the bug by making sure the lease not found error is returned so that the invoking function parses the the error message instead.
fix#6537
Do not restart the killed member immediately.
The member will advance its election timeout after restart
So it will have a better chance to become the leader again.
When using the embed functionality, you can't call the Server.Stop()
function until StartEtcd returns, which can block until there is a call
to Server.Stop() in error situations. Since we have a catch-22, the
ReadyNotify() can be called manually by the user if they wish to wait
for the server startup, or in parallel with a timeout if they wish to
cancel it after some time.
Chzz pointed out that this is also more consistent with the
etcdserver.Start() behaviour too.
purpleidea pointed out that this is actually more correct too, because
we can now register the stop interrupt handler before we block on
startup.
After winning an election or obtaining a lock, we
auto-append a slash after the provided key prefix.
This avoids the previous deadlock due to waiting
on the wrong key.
Fixes#6278