Perviously, we only use 8bits from member identification
in id generation. The conflict rate is A(256,3)/256^3, which
is around 1%. Now we use 16bites to reduce the rate to 0.005%.
We can attach the full member id into id generation if needed...
On the latest master branch, etcdserver/etcdserverpb/etcdserver.pb.go
is changed when scripts/genproto.sh is executed. The content only has
changes for comment. Therefore it is not important but the change is
annoying when we update the proto file.
For https://github.com/coreos/etcd/issues/4338.
And resp.Header.Revision should be from the one in storage
when we just do range, because there is no storage data
change.
When raft broadcasts a Grant to all nodes, all nodes must
agree on the same lease ID. Otherwise, attaching a key to
a lease will fail since the lease ID is node-dependent.
This commit adds a new option for activating profiling based on pprof
in etcd process.
- -enable-pprof: boolean type option which activates profiling
For example, if a client URL is http://localhost:12379, users and
developers access heap profiler with this URL:
http://localhost:12379/debug/pprof/heap
Basic support for lease operations like create and revoke.
We still need to:
1. attach keys to leases in KV implmentation if lease field is set
2. leader periodically removes expired leases
3. leader serves keepAlive requests and follower forwards keepAlive
requests to leader.
We want the KV to support recovering from backend to avoid
additional pointer swap. Or we have to do coordination between
etcdserver and API layer, since API layer might have access to
kv pointer and use a closed kv.
Provides two implementations of Recorder-- one that is non-blocking
like the original version and one that provides a blocking channel
to avoid busy waiting or racing in tests when no other synchronization
is available.
storage/storagepb: remove watchID from Event
storage: add WatchResponse to watcher.go to wrap events, watchID
storage: watchableStore to use WatchResponse
storage: kv_test with WatchResponse
etcdserver/api/v3rpc: watch to receive storage.WatchResponse type
Watcher vs Watching in storage pkg is confusing. Watcher should be named
as watchStream since it contains a channel as stream to send out events.
Then we can rename watching to watcher, which actually watches on a key
and send watched events through watchStream.
This commits renames watcher to watchStram.
This commit fixes the issue of creating member dir before validating
the configuration. When member dir exists, it indicates the local etcd
process is a valid etcd member. So we should only create member dir
after we finish configuration validation, joining validation or
discovery validation.
The raft loop would block on the applier's done channel after
persisting the raft messages; the latency could cause dropped network
messages. Instead, asynchronously notify the applier with a buffered
channel when the raft writes complete.
raft's applyc writes block on the server loop's database IO since
the next applyc read must wait on the db operation to finish.
Instead, stream applyc to a run queue outside the server loop.
In this case, we know we are waiting for an action happened on
storage. We can do a busy wait instead of calling waitSchedule.
The test previously failed on CI with no observed actions.
When a watch stream closes, both of the watcher.Chan and closec
will be closed.
If watcher.Chan is closed, we should not send out the empty event.
Sending the empty is wrong and waste a lot of CPU resources.
Instead we should just return.
We should open real txn for applying txn requests. Or the intermediate
state might be observed by reader.
This also fixes#3803. Same consistent(raft) index per multiple indenpendent
operations confuses consistentStore.
We have a structure called InternalRaftRequest. Making the function
shorter by calling it processInternalRaftReq seems to be random and
reduce the readability. So we just use the full name.
This commit fixes an error log caused by the strict reconfig checking
option.
Before:
14:21:38 etcd2 | 2015-11-05 14:21:38.870356 E | etcdhttp: got unexpected response error (etcdserver: re-configuration failed due to not enough started members)
After:
log
13:27:33 etcd2 | 2015-11-05 13:27:33.089364 E | etcdhttp: etcdserver: re-configuration failed due to not enough started members
The error is not an unexpected thing therefore the old message is
incorrect.
This moves the code to create listener and roundTripper for raft communication
to the same place, and use explicit functions to build them. This prevents
possible development errors in the future.
This pairs with remote timeout listeners.
etcd uses timeout listener, and times out the accepted connections
if there is no activity. So the idle connections may time out easily.
Becaus timeout transport doesn't reuse connections, it prevents using
timeouted connection.
This fixes the problem that etcd fail to get version of peers.
This attempts to decouple password-related functions, which previously
existed both in the Store and User structs, by splitting them out into a
separate interface, PasswordStore. This means that they can be more
easily swapped out during testing.
This also changes the relevant tests to use mock password functions
instead of the bcrypt-backed implementations; as a result, the tests are
much faster.
Before:
```
github.com/coreos/etcd/etcdserver/auth 31.495s
github.com/coreos/etcd/etcdserver/etcdhttp 91.205s
```
After:
```
github.com/coreos/etcd/etcdserver/auth 1.207s
github.com/coreos/etcd/etcdserver/etcdhttp 1.207s
```
When a slow follower receives the snapshot sent from the leader, it
should rename the snapshot file to the default KV file path, and
restore KV snapshot.
Have tested it manually and it works pretty well.
When snapshot store requests raft snapshot from etcdserver apply loop,
it may block on the channel for some time, or wait some time for KV to
snapshot. This is unexpected because raft state machine should be unblocked.
Even worse, this block may lead to deadlock:
1. raft state machine waits on getting snapshot from raft memory storage
2. raft memory storage waits snapshot store to get snapshot
3. snapshot store requests raft snapshot from apply loop
4. apply loop is applying entries, and waits raftNode loop to finish
messages sending
5. raftNode loop waits peer loop in Transport to send out messages
6. peer loop in Transport waits for raft state machine to process message
Fix it by changing the logic of getSnap to be asynchronously creation.
Use snapshotSender to send v3 snapshot message. It puts raft snapshot
message and v3 snapshot into request body, then sends it to the target peer.
When it receives http.StatusNoContent, it knows the message has been
received and processed successfully.
As receiver, snapHandler saves v3 snapshot and then processes the raft snapshot
message, then respond with http.StatusNoContent.
rafthttp has different requirements for connections created by the
transport for different usage, and this is hard to achieve when giving
one http.RoundTripper. Pass into pkg the data needed to build transport
now, and let rafthttp build its own transports.