suppose a lease granting request from a follower goes through and followed by a lease look up or renewal, the leader might not apply the lease grant request locally. So the leader might not find the lease from the lease look up or renewal request which will result lease not found error. To fix this issue, we force the leader to apply its pending commited index before looking up lease.
FIX#6978
This commit protects membership change operations with auth. Only
users that have root role can issue the operations.
Implements https://github.com/coreos/etcd/issues/6899
Would retry a few times before returning a not primary error that
the client should never see. Instead, use proper timeouts and
then return a request timeout error on failure.
Fixes#6922
This exists to prevent sending too many requests that
would lead into applier falling behind Raft accepting-proposal.
Based on recent benchmarks, etcd was able to process high workloads
(2 million writes with 1K concurrent clients).
The limit 1000 is too conservative to test those high workloads.
Address https://github.com/coreos/etcd/issues/6754.
In case there are network errors or unexpected EOF errors
in TimeToLive http requests to leader, we translate that into
ErrNoLeader, and expects the client to retry its request.
The cost of bcrypt password checking is quite high (almost 100ms on a
modern machine) so executing it in apply loop will be
problematic. This commit exclude the checking mechanism to the API
layer. The password checking is validated with the OCC like way
similar to the auth of serializable get.
This commit also removes a unit test of Authenticate RPC from
auth/store_test.go. It is because the RPC now accepts an auth request
unconditionally and delegates the checking functionality to
authStore.CheckPassword() (so a unit test for CheckPassword() is
added). The combination of the two functionalities can be tested by
e2e (e.g. TestCtlV3AuthWriteKey).
Fixes https://github.com/coreos/etcd/issues/6530
When running test suites for a client locally I'm getting spammed by log
lines such as:
etcdserver: apply entries took too long [14.226771ms for 1 entries]
The comments in #6278 mention there were future plans of increasing the
threshold for logging these warnings, but it hadn't been done yet.
If we promote the lessor before finish applying all
entries from the last term, we might incorrectly renew
the already revoked leases.
Here is an example:
- Term 1: revoke lease A accepted by raft
- Old leader failed, new election happened
- Term 2: promote
- Term 2: keep alive A succeed. A now has 10 seconds TTL
- Term 2: revoke lease A from Term 1 got committed and applied
- Term 2: the lease A with 10 seconds TTL is revoked
To solve this, the new leader MUST apply all entries from old term
before promote its lessor to start accept renew requests.
When the non Leader etcd server receives a LeaseTimeToLive on a nonexistent lease, it responds with a nil resp and a nil error The invoking function parses the nil resp and results a segmentation fault.
I fix the bug by making sure the lease not found error is returned so that the invoking function parses the the error message instead.
fix#6537
When 1000 leases expired at the same time, etcd takes more than 5 seconds to clean them. This means that even after the leases have expired, keys associated with leases are still accessible. I increase the deletion throughput by parallelizing leases deletion process.
All outstanding goroutines now go into the etcdserver waitgroup. goroutines are
shutdown with a "stopping" channel which is closed when the run() goroutine
shutsdown. The done channel will only close once the waitgroup is totally cleared.
Added support into the v2 API to fix an issue (6433) where if there is a semicolon
and fields after it the API would return an "invalid Content-type" message even
if the content type was actually correct
If a user upgrades etcd from 2.3.x to 3.0 and shutdown the
cluster immediately without triggering any new backend writes,
then the consistent index in backend would be zero.
The user cannot restart etcdserver due to today's strick index
match checking. We now have to lose this a bit for this case.
kv.commit updates the consistent index in backend. When
executing in parallel with apply, it might grab tx lock
after apply update the consistent index and before apply
starts to execute the opeartion. If the server dies right
after kv.commit, the consistent is updated but the opeartion
is not executed. If we restart etcd server, etcd will skip
the operation. :(
There are a few other places that we need to take care of,
but let us fix this first.
moved the code for perparing and sorting of advertising peer urls and
sorting of peer urls only when strict verification needs to be done.
This is done to avoid this processing when strict verification is not
required like in case of VerifyJoinExisting function.
#6165
If a server isn't serving txn requests from a client, the server
doesn't need the result of range requests in the txn.
This is a succeeding commit of
https://github.com/coreos/etcd/pull/5689
This commit introduces revision of authStore. The revision number
represents a version of authStore that is incremented by updating auth
related information.
The revision is required for avoiding TOCTOU problems. Currently there
are two types of the TOCTOU problems in v3 auth.
The first one is in ordinal linearizable requests with a sequence like
below ():
1. Request from client CA is processed in follower FA. FA looks up the
username (let it U) for the request from a token of the request. At
this time, the request is authorized correctly.
2. Another request from client CB is processed in follower FB. CB
is for changing U's password.
3. FB forwards the request from CB to the leader before FA. Now U's
password is updated and the request from CA should be rejected.
4. However, the request from CA is processed by the leader because
authentication is already done in FA.
For avoiding the above sequence, this commit lets
etcdserverpb.RequestHeader have a member revision. The member is
initialized during authentication by followers and checked in a
leader. If the revision in RequestHeader is lower than the leader's
authStore revision, it means a sequence like above happened. In such a
case, the state machine returns auth.ErrAuthRevisionObsolete. The
error code lets nodes retry their requests.
The second one, a case of serializable range and txn, is more
subtle. Because these requests are processed in follower directly. The
TOCTOU problem can be caused by a sequence like below:
1. Serializable request from client CA is processed in follower FA. At
first, FA looks up the username (let it U) and its permission
before actual access to KV.
2. Another request from client CB is processed in follower FB and
forwarded to the leader. The cluster including FA now commits a log
entry of the request from CB. Assume the request changed the
permission or password of U.
3. Now the serializable request from CA is accessing to KV. Even if
the access is allowed at the point of 1, now it can be invalid
because of the change introduced in 2.
For avoiding the above sequence, this commit lets the functions of
serializable requests (EtcdServer.Range() and EtcdServer.Txn())
compare the revision in the request header with the latest revision of
authStore after the actual access. If the saved revision is lower than
the latest one, it means the permission can be changed. Although it
would introduce false positives (e.g. changing other user's password),
it prevents the TOCTOU problem. This idea is an implementation of
Anthony's comment:
https://github.com/coreos/etcd/pull/5739#issuecomment-228128254
Most fields accessed with sync/atomic functions are 64bit aligned, but a couple
are not. This makes comments out of date and therefore misleading.
Affected fields reordered, comments scrubbed and updated.
This commit lets etcdserver skip needless log entry applying. If the
result of log applying isn't required by the node (client that issued
the request isn't talking with the node) and the operation has no side
effects, applying can be skipped.
It would contribute to reduce disk I/O on followers and be useful for
a cluster that processes much serializable get.
Before
2016-07-01 14:57:50.927170 I | api: enabled capabilities for version 3.0.0
After
2016-07-01 14:57:50.927170 I | api: enabled capabilities for version 3.0
Now user can filter events with types. The API is also extensible.
It might make sense for the proxy to filter out events based on
more expensive/customized filter.
Currently the user can't list only the keys in a prefix search. In
order to support such operations the filtering will be done on the
server side to reduce the encoding and network transfer costs.
This commit expands RPCs for getting user and role and support list up
all users and roles. etcdctl v3 is now support getting all users and
roles with the newly added option --all e.g. etcdctl user get --all
Currently auth tokens are generated in the replicated state machine
layer randomly. It means one auth token generated in node A cannot be
used for node B. It is problematic for load balancing and fail
over. This commit moves the token generation logic from the state
machine to API layer (before raft) and let all nodes share a single
token.
Log index of Raft is also added to a token for ensuring uniqueness of
the token and detecting activation of the token in the cluster (some
nodes can receive the token before generating and installing the token
in its state machine).
This commit also lets authStore have simple token related things. It
is required because of unit test. The test requires cleaning of the
state of the simple token things after one test (succeeding test can
create duplicated token and it causes panic).
This commit adds a feature for granting and revoking range of keys,
not a single key.
Example:
$ ETCDCTL_API=3 bin/etcdctl role grant r1 readwrite k1 k3
Role r1 updated
$ ETCDCTL_API=3 bin/etcdctl role get r1
Role r1
KV Read:
[a, b)
[k1, k3)
[k2, k4)
KV Write:
[a, b)
[k1, k3)
[k2, k4)
$ ETCDCTL_API=3 bin/etcdctl --user u1:p get k1 k4
k1
v1
$ ETCDCTL_API=3 bin/etcdctl --user u1:p get k1 k5
Error: etcdserver: permission denied
Currently the auth mechanism doesn't support permissions of range
request. It just checks exact matching of key names even for range
queries. This commit adds a mechanism for setting permission to range
queries. Range queries are allowed if a range of the query is [begin1,
end1) and the user has a permission of reading [begin2, range2) and
[begin1, end2) is a subset of [begin2, range2). Range delete requests
will follow the same rule.
This commit implements RoleGet() RPC of etcdserver and adds a new
subcommand "role get" to etcdctl v3. It will list up permissions that
are granted to a given role.
$ ETCDCTL_API=3 bin/etcdctl role get r1
Role r1
KV Read:
b
d
KV Write:
a
c
d
This commit adds a new subcommand "user get" to etcdctl v3. It will
list up roles that are granted to a given user.
Example:
$ ETCDCTL_API=3 bin/etcdctl user get u1
User: u1
Roles: r1 r2 r3
This commit also modifies the layout of InternalRaftRequest for
frequent update of auth related members.
Current permission checking mechanism doesn't return its error code
well. The internal error (code = 13) is returned to client and the
retry mechanism doesn't work well. This commit fixes the problem.
This commit lets etcdserver check permission during its log applying
phase. With this change, permission checking of operations is
supported.
Currently, put and range are supported. In addition, multi key
permission check of range isn't supported yet.
Current etcdserver tries to return result.resp even if result.err is
not nil. A situation of result.resp == nil and result.err != nil can
happen and it results an error like below:
18:49:57 etcd1 | interface conversion: proto.Message is nil, not *etcdserverpb.PutResponse
This commit lets the functions return result.err if it is not nil.
Migrate command accepts a datadir and an optional user-provided
transformer function that transform v2 keys to v2 keys.
Migrate command then builds a v3 backend state based on the existing
v2 keys and the output of the transformer function.
There was a bug in protodoc.
This changes git SHA to use the latest protodoc.
And make the letter casing consistent with original
Protocol Buffer. Go capitalizes the member variables,
but the protocol buffer documentation should be same as
original proto files.