* coreos/master:
rafthttp: fix import
raft: should not decrease match and next when handling out of order msgAppResp
Fix migration to allow snapshots to have the right IDs
add snapshotted integration test
fix test import loop
fix import loop, add set to types, and fix comments
etcdserver: autodetect v0.4 WALs and upgrade them to v0.5 automatically
wal: add a bench for write entry
rafthttp: add streaming server and client
dep: use vendored imports in codegangsta/cli
dep: bump golang.org/x/net/context
Conflicts:
etcdserver/server.go
etcdserver/server_test.go
migrate/snapshot.go
* coreos/master:
scripts: build-docker tag and use ENTRYPOINT
scripts: build-release add etcd-migrate
create .godir
raft: optimistically increase the next if the follower is already matched
raft: add handleHeartbeat handleHeartbeat commits to the commit index in the message. It never decreases the commit index of the raft state machine.
rafthttp: send takes raft message instead of bytes
*: add rafthttp pkg into test list
raft: include commitIndex in heartbeat
rafthttp: move server stats in raftHandler to etcdserver
*: etcdhttp.raftHandler -> rafthttp.RaftHandler
etcdserver: rename sender.go -> sendhub.go
*: etcdserver.sender -> rafthttp.Sender
Conflicts:
raft/log.go
raft/raft_paper_test.go
Compaction is now treated as an implementation detail of Storage
implementations; Node.Compact() and related functionality have been
removed. Ready.Snapshot is now used only for incoming snapshots.
A return value has been added to ApplyConfChange to allow applications
to track the node information that must be stored in the snapshot.
raftpb.Snapshot has been split into Snapshot and SnapshotMetadata, to
allow the full snapshot data to be read from disk only when needed.
raft.Storage has new methods Snapshot, ApplySnapshot, HardState, and
SetHardState. The Snapshot and HardState parameters have been removed
from RestartNode() and will now be loaded from Storage instead.
The only remaining difference between StartNode and RestartNode is that
the former bootstraps an initial list of Peers.
This is useful since we want to pipeline the appendEntry requests. Without
enabling optimistic increasing, the second pipelining appendEntry request
will include the entries the first one has already sent out. We decrease
the next directly to match if the leader receives a rejection for a matched
follower. This happens if one pipelining request get lost and following ones
arrives at the follower.
* coreos/master: (21 commits)
etcdserver: refactor ValidateClusterAndAssignIDs
integration: add integration test for remove member
integration: add test for member restart
version: bump to alpha.3
etcdserver: add buffer to the sender queue
*: gracefully stop etcdserver
Fix up migration tool, add snapshot migration
etcd4: migration from v0.4 -> v0.5
etcdserver: export Member.StoreKey
etcdserver: recover cluster when receiving newer snapshot
etcdserver: check and select committed entries to apply
etcdserver: recover from snapshot before applying requests
raft: not set applied when restored from snapshot
sender: support elegant stop
etcdserver: add StopNotify
etcdserver: fix TestDoProposalStopped test
etcdserver: minor cleanup
etcdserver: validate new node is not registered before in best effort
etcdserver: fix server.Stop()
*: print out configuration when necessary
...
Conflicts:
etcdserver/server.go
etcdserver/server_test.go
raft/log.go
The first entry in the log is a dummy which is used for matchTerm
but may not have an actual payload. This change permits Storage
implementations to treat this term value specially instead of
storing it as a dummy Entry.
Storage.FirstIndex() no longer includes the term-only entry.
This reverses a recent decision to create entry zero as initially
unstable; Storage implementations are now required to make
Term(0) == 0 and the first unstable entry is now index 1.
stableTo(0) is no longer allowed.
* coreos/master:
etcdserver: add sender tests
raft: Only call stableTo when we have ready entries or a snapshot.
etcdserver: add ID() function to the Server interface.
sender: use RoundTripper instead of Client in sender
The first Ready after RestartNode (with no snapshot) will have no
unstable entries, so we don't have the correct prevLastUnstablei
when Advance is called. This would cause raftLog.unstable to move
backwards and previously-stable entries would be returned to
the application again.
This should have been caught by the "unexpected Ready" portion of
TestNodeRestart, but it went unnoticed because the Node's goroutine
takes some time to read from advancec and prepare the write to read to
readyc. Added a small (1ms) delay to all such tests to ensure that the
goroutine has time to enter its select wait.
* coreos/master: (27 commits)
pkg/wait: move wait to pkg/wait
etcdserver: do not add/remove/update local member to/from sender hub
etcdserver: not record attributes when add member
raft: add a test for proposeConfChange
raft: block Stop() on n.done, support idempotency
raft: add a test for node proposal
integration: add increase cluster size test
integration: remove unnecessary t.Testing argument
raft: stop the node synchronously
integration: fix test to propagate NewServer errors
etcdserver: move peer URLs check to config
etcdserver: ensure initial-advertise-peer-urls match initial-cluster
raft: add a test for node.Tick
raft: add comment string for TestNodeStart
etcdserver: use member instead of node at etcd level
raft: nodes return sorted ids
raft: update unstable when calling stableTo with 0
*: support updating advertise-peer-url Users might want to update the peerurl of the etcd member in several cases. For example, if the IP address of the physical machine etcd running on is changed, user need to update the adversite-pee-rurl accordingly. This commit makes etcd support updating the advertise-peer-url of its members.
transport: create a tls listener only if the tlsInfo is not empty and the scheme is HTTPS
etcdserver: use member pointer for all tests
...
Conflicts:
etcdserver/server.go
raft/log.go
raft/log_test.go
raft/node.go
This entry is now persisted through the normal flow instead of appearing
in the stored log at creation time. This is how things worked before
the Storage interface was introduced. (see coreos/etcd#1689)
Callers must in general have a reference to their Storage objects to
transfer entries from Ready to Storage, so it doesn't make sense to
create a hidden Storage for them.
By explicitly creating Storage objects in tests we can remove a
few casts of raftLog's storage field.
Users might want to update the peerurl of the etcd member in several cases.
For example, if the IP address of the physical machine etcd running on is
changed, user need to update the adversite-pee-rurl accordingly.
This commit makes etcd support updating the advertise-peer-url of its members.
This change splits the raftLog.entries array into an in-memory
"unstable" list and a pluggable interface for retrieving entries that
have been persisted to disk. An in-memory implementation of this
interface is provided which behaves the same as the old version;
in a future commit etcdserver could replace the MemoryStorage with
one backed by the WAL.
Node set the applied to committed right after it sends out Ready to application. This is not
correct since the application has not actually applied the entries at that point. We add a
Advance interface to Node. Application needs to call Advance to tell raft Node its progress.
Also this change can avoid unnecessary copying when application is still applying entires but
there are more entries to be applied.
As explained in #1366, the leader will fail to transmit the missed
logs if the leader receives a hearbeat response from a follower
that is not yet matched in the leader. In other words, there are
append responses that do not explicitly reject an append but
implied a gap.
This commit is based on @xiangli-cmu's idea. We should only acknowledge
upto the index of logs in the append message. This way responses to
heartbeats would never interfer with the log synchronization because
their log index is always 0.
Fixes#1366
This adds the remaining two stats endpoints: `/v2/stats/self`, for
various statistics on the EtcdServer, and `/v2/stats/leader`, for
statistics on a leader's followers.
By and large most of the stats code is copied across from 0.4.x, updated
where necessary to integrate with the new decoupling of raft from
transport.
This does not satisfactorily resolve the question of name vs ID. In the
old world, names were unique in the cluster and transmitted over the
wire, so they could be used safely in all statistics. In the new world,
a given EtcdServer only knows its own name, and it is instead IDs that
are communicated among the cluster members. Hence in most places here we
simply substitute a string-encoded ID in place of name, and only where
possible do we retain the actual given name of the EtcdServer.
Since these are arrays instead of maps, the "keys" here are actually
(useless) goto labels. What really matters is that the ordering is
the same between the constant declarations and the array.
Before this commit, compact always compact log at current appliedindex of raft.
This prevents us from doing non-blocking snapshot since we have to make snapshot
and compact atomically. To prepare for non-blocking snapshot, this commit make
compact supports index and nodes parameters. After completing snapshot, the applier
should call compact with the snapshot index and the nodes at snapshot index to do
a compaction at snapsohot index.
send should not attach current term to msgProp. Send should simply do proxy for msgProp without
changing its term. msgProp has a special term 0, which indicates that it is a local message.
The usage of removed:
1. tell removed node about its removal explicitly using msgDenied
2. prevent removed node disrupt cluster progress by launching leader election
It is set when apply node removal, or receive msgDenied.