vitalif/etcd - etcd

Commit Graph

Author	SHA1	Message	Date
dvonthenen	25cdf4ed92	*: expose Raft Applied Index through to "etcdctl endpoint status" Fixed based on feedback Fixed spacing Fix gofmt	2018-01-22 07:37:21 -08:00
Gyu-Ho Lee	75110dd839	*: fix naked returns Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-11-10 18:46:15 -08:00
Anthony Romano	dcf52bbfac	etcdserver, embed, integration: don't use pointer for ServerConfig ServerConfig is owned by etdcserver and unshared, so don't pass or store by pointer. Also removes duplicated field 'snapCount'.	2017-06-15 13:02:13 -07:00
fanmin shi	8b7b7222dd	etcdserver: renaming db happens after snapshot persists to wal and snap files In the case that follower recieves a snapshot from leader and crashes before renaming xxx.snap.db to db but after snapshot has persisted to .wal and .snap, restarting follower results loading old db, new .wal, and new .snap. This will causes a index mismatch between snap metadata index and consistent index from db. This pr forces an ordering where saving/renaming db must happen after snapshot is persisted to wal and snap file. this guarantees wal and snap files are newer than db. on server restart, etcd server checks if snap index > db consistent index. if yes, etcd server attempts to load xxx.snap.db where xxx=snap index if there is any and panic other wise. FIXES #7628	2017-05-09 14:00:12 -07:00
Gyu-Ho Lee	327f09fcb4	etcdserver: do not block on raft stopping Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 13:35:43 -07:00
Gyu-Ho Lee	91f6aee4f2	etcdserver: ensure waitForApply sync with applyAll Problem is: `Step1`: `etcdserver/raft.go`'s `Ready` process routine sends config-change entries via `r.applyc <- ap` (https://github.com/coreos/etcd/blob/master/etcdserver/raft.go#L193-L203) `Step2`: `etcdserver/server.go`'s `*EtcdServer.run` routine receives this via `ap := <-s.r.apply()` (https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L735-L738) `StepA`: `Step1` proceeds without sync, right after sending `r.applyc <- ap`. `StepB`: `Step2` proceeds without sync, right after `sched.Schedule(s.applyAll(&ep,&ap))`. `StepC`: `etcdserver` tries to sync with `s.applyAll(&ep,&ap)` by calling `rh.waitForApply()`. `rh.waitForApply()` waits for all pending jobs to finish in `pkg/schedule` side. However, the order of `StepA`,`StepB`,`StepC` is not guaranteed. It is possible that `StepC` happens first, and proceeds without waiting on apply. And the restarting member comes back as a leader in single-node cluster, when there is no synchronization between apply-layer and config-change Raft entry apply. Confirmed with more debugging lines below, only reproducible with slow CPU VM (~2 vCPU). ``` ~:24.005397 I \| etcdserver: starting server... [version: 3.2.0+git, cluster version: to_be_decided] ~:24.011136 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply before ~:24.011194 I \| etcdserver: [DEBUG] 29b2d24047a277df starts wait for 0 pending jobs ~:24.011234 I \| etcdserver: [DEBUG] 29b2d24047a277df finished wait for 0 pending jobs (current pending 0) ~:24.011268 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply after ~:24.011348 I \| etcdserver: [DEBUG] [0] 29b2d24047a277df is scheduling conf change on 29b2d24047a277df ~:24.011396 I \| etcdserver: [DEBUG] [1] 29b2d24047a277df is scheduling conf change on 5edf80e32a334cf0 ~:24.011437 I \| etcdserver: [DEBUG] [2] 29b2d24047a277df is scheduling conf change on e32e31e76c8d2678 ~:24.011477 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 29b2d24047a277df ~:24.011509 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 5edf80e32a334cf0 ~:24.011545 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on e32e31e76c8d2678 ~:24.012500 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df before ~:24.013014 I \| etcdserver/membership: added member 29b2d24047a277df [unix://127.0.0.1:2100515039] to cluster 9250d4ae34216949 ~:24.013066 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after ~:24.013113 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after trigger ~:24.013158 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 5edf80e32a334cf0 before ~:24.013666 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 11.964739ms) ~:24.013709 W \| etcdserver: server is likely overloaded ~:24.013750 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 12.057265ms) ~:24.013775 W \| etcdserver: server is likely overloaded ~:24.013950 I \| raft: 29b2d24047a277df is starting a new election at term 4 ~:24.014012 I \| raft: 29b2d24047a277df became candidate at term 5 ~:24.014051 I \| raft: 29b2d24047a277df received MsgVoteResp from 29b2d24047a277df at term 5 ~:24.014107 I \| raft: 29b2d24047a277df became leader at term 5 ~:24.014146 I \| raft: raft.node: 29b2d24047a277df elected leader 29b2d24047a277df at term 5 ``` I am printing out the number of pending jobs before we call `sched.WaitFinish(0)`, and there was no pending jobs, so it returned immediately (before we schedule `applyAll`). This is the root cause to: - https://github.com/coreos/etcd/issues/7595 - https://github.com/coreos/etcd/issues/7739 - https://github.com/coreos/etcd/issues/7802 `sched.WaitFinish(0)` doesn't work when `len(f.pendings)==0` and `f.finished==0`. Config-change is the first job to apply, so `f.finished` is 0 in this case. `f.finished` monotonically increases, so we need `WaitFinish(finished+1)`. And `finished` must be the one before calling `Schedule`. This is safe because `Schedule(applyAll)` is the only place adding jobs to `sched`. Then scheduler waits on the single job of `applyAll`, by getting the current number of finished jobs before sending `Schedule`. Or just make it be blocked until `applyAll` routine triggers on the config-change job. This patch just removes `waitForApply`, and signal `raftDone` to wait until `applyAll` finishes applying entries. Confirmed that it fixes the issue, as below: ``` ~:43.198354 I \| rafthttp: started streaming with peer 36cda5222aba364b (stream MsgApp v2 reader) ~:43.198740 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply before ~:43.198836 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c starts wait for 0 pending jobs, 1 finished jobs ~:43.200696 I \| integration: launched 3169361310155633349 () ~:43.201784 I \| etcdserver: [DEBUG] [0] 3988bc20c2b2e40c is scheduling conf change on 36cda5222aba364b ~:43.201884 I \| etcdserver: [DEBUG] [1] 3988bc20c2b2e40c is scheduling conf change on 3988bc20c2b2e40c ~:43.201965 I \| etcdserver: [DEBUG] [2] 3988bc20c2b2e40c is scheduling conf change on cf5d6cbc2a121727 ~:43.202070 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 36cda5222aba364b ~:43.202139 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 3988bc20c2b2e40c ~:43.202204 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on cf5d6cbc2a121727 ~:43.202444 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) before ~:43.204486 I \| etcdserver/membership: added member 36cda5222aba364b [unix://127.0.0.1:2100913646] to cluster 425d73f1b7b01674 ~:43.204588 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after ~:43.204703 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after trigger ~:43.204791 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) before ~:43.205689 I \| etcdserver/membership: added member 3988bc20c2b2e40c [unix://127.0.0.1:2101113646] to cluster 425d73f1b7b01674 ~:43.205783 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after ~:43.205929 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after trigger ~:43.206056 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) before ~:43.207353 I \| etcdserver/membership: added member cf5d6cbc2a121727 [unix://127.0.0.1:2100713646] to cluster 425d73f1b7b01674 ~:43.207516 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after ~:43.207619 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after trigger ~:43.207710 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 36cda5222aba364b ~:43.207781 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 3988bc20c2b2e40c ~:43.207843 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on cf5d6cbc2a121727 ~:43.207951 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished wait for 0 pending jobs (current pending 0, finished 1) ~:43.208029 I \| rafthttp: started HTTP pipelining with peer cf5d6cbc2a121727 ~:43.210339 I \| rafthttp: peer 3988bc20c2b2e40c became active ~:43.210435 I \| rafthttp: established a TCP streaming connection with peer 3988bc20c2b2e40c (stream MsgApp v2 reader) ~:43.210861 I \| rafthttp: started streaming with peer 3988bc20c2b2e40c (writer) ~:43.211732 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply after ``` Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 10:22:27 -07:00
Anthony Romano	714b48a4b4	etcdserver: initialize raftNode with constructor raftNode was being initialized in start(), which was causing hangs when trying to stop the etcd server since the stop channel would not be initialized in time for the stop call. Instead, setup non-configurable bits in a constructor. Fixes #7668	2017-04-18 09:33:59 -07:00
Gyu-Ho Lee	04354f32ab	etcdserver: wait apply on conf change Raft entry When apply-layer sees configuration change entry in raft.Ready.CommittedEntries, the server should not proceed until that entry is applied. Otherwise, follower's raft layer advances, possibly election-timeouts, and becomes the leader in single-node cluster, before add-node conf change of other nodes is applied. Fix https://github.com/coreos/etcd/issues/7595. Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-13 15:59:24 -07:00
Xiang	7f0733cf46	etcdserver: candidate should wait for applying all configuration changes	2017-03-14 17:20:20 -07:00
Gyu-Ho Lee	3d75395875	*: remove never-unused vars, minor lint fix Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-03-06 14:59:12 -08:00
fanmin shi	2a1bae0c2a	etcdserver: consistent naming in raftReadyHandler	2016-12-29 11:27:16 -08:00
fanmin shi	2faf72f47c	etcdserver: rework update committed index logic	2016-12-27 10:11:40 -08:00
Gyu-Ho Lee	3fd1d951f8	etcdserver: time out when readStateC is blocking Otherwise, it will block forever when the server is overloaded. Fix https://github.com/coreos/etcd/issues/6891.	2016-12-05 15:34:46 -08:00
Gyu-Ho Lee	6ec03d3f7c	etcdserver: move 'EtcdServer.send' to raft.go Clear 'TODO'	2016-10-26 16:26:00 -07:00
Gyu-Ho Lee	e011ea25ca	etcdserver: separate EtcdServer from raftNode	2016-10-07 13:18:39 -07:00
Xiang Li	0f0c048e29	etcdserver: fix early lessor promotion issue If we promote the lessor before finish applying all entries from the last term, we might incorrectly renew the already revoked leases. Here is an example: - Term 1: revoke lease A accepted by raft - Old leader failed, new election happened - Term 2: promote - Term 2: keep alive A succeed. A now has 10 seconds TTL - Term 2: revoke lease A from Term 1 got committed and applied - Term 2: the lease A with 10 seconds TTL is revoked To solve this, the new leader MUST apply all entries from old term before promote its lessor to start accept renew requests.	2016-10-05 14:41:47 -07:00
Xiang Li	e3e3993022	etcdserver: support read index Use read index to achieve l-read.	2016-09-27 13:41:40 +08:00
Anthony Romano	de68818f03	etcdserver: add some failpoints	2016-06-21 14:43:20 -07:00
Xiang Li	9c78cda088	etcdserver: save state before save snapshot	2016-06-15 22:00:33 -07:00
Gyu-Ho Lee	32d766d749	etcdserver: preallocate slice	2016-06-15 13:03:10 -07:00
Gyu-Ho Lee	abb4cd5646	etcdserver: update LICENSE header	2016-05-12 20:49:40 -07:00
Xiang Li	9c103dd0de	*: cancel required leader streams when memeber lost its leader	2016-05-12 19:42:21 -07:00
Anthony Romano	dcb3b7aecf	*: scrub legacy ports from code and scripts	2016-05-11 13:46:30 -07:00
Xiang Li	ab11415d25	*: add proposalsCommitted metrics	2016-05-10 10:56:25 -07:00
Xiang Li	824478be5f	*: add has leader metrics	2016-05-06 13:59:19 -07:00
Xiang Li	76d073a2b5	*: add leader changes to metrics	2016-05-06 13:12:20 -07:00
Xiang Li	3c0ac9d600	etcdserver: set backend to cluster	2016-04-08 21:46:45 -07:00
Xiang Li	bf2289ae00	etcdserver: move membership related code to membership pkg	2016-04-07 14:21:37 -07:00
Xiang Li	70a9391378	*: enable v3 by default	2016-03-23 17:01:36 -07:00
Anthony Romano	bd832e5b0a	*: migrate Godeps to vendor/	2016-03-22 17:10:28 -07:00
Xiang Li	2a28ac7ad4	etcdserver: leader should stepdown when lose quorum for v3	2016-03-15 23:23:26 -07:00
Xiang Li	e9a0a103e5	*: refresh the lease TTL correctly when a leader is elected. The new leader needs to refresh with an extened TTL to gracefully handle the potential concurrent leader issue. Clients might still send keep alive to old leader until the old leader itself gives up leadership at most after an election timeout.	2016-03-15 22:40:03 -07:00
Xiang Li	0f9d04237c	etcdserver: leader latency optimization	2016-03-12 22:51:13 -08:00
Xiang Li	d6520303c6	etcdserver: detect raft starvation caused by contention	2016-02-29 17:06:57 -08:00
Xiang Li	d265fe000c	*: support time based auto compaction. Fix https://github.com/coreos/etcd/issues/3906. We will have extensive doc to talk about what is compaction and what is auto compaction soon.	2016-02-25 16:02:03 -08:00
Anthony Romano	20461ab11a	*: fix many typos	2016-01-31 21:42:39 -08:00
Xiang Li	f5fa9b5384	*: expose Lessor Promote and Demote interface	2016-01-07 18:18:20 -08:00
Anthony Romano	aca0c466ed	etcdserver: asynchronously notify applier when raft writes finish The raft loop would block on the applier's done channel after persisting the raft messages; the latency could cause dropped network messages. Instead, asynchronously notify the applier with a buffered channel when the raft writes complete.	2015-12-22 14:15:14 -08:00
Gyu-Ho Lee	40b11038f2	etcdserver: fixes shadowed variables for go tool vet Fix for https://github.com/coreos/etcd/issues/3954.	2015-12-12 09:13:12 -08:00
Xiang Li	23bd60ccce	*: rewrite snapshot sending	2015-12-08 18:21:21 -08:00
Xiang Li	a8e6e71bf9	*: fix various data races detected by race detector	2015-10-26 20:49:37 -07:00
Yicheng Qin	8c94ae0ee3	etcdserver: get existing snapshot instead of requesting one This fixes the problem that proposal cannot be applied. When start the etcdserver.run loop, it expects to get the latest existing snapshot. It should not attempt to request one because the loop is the entity to create the snapshot.	2015-10-05 14:32:16 -07:00
Yicheng Qin	2276328720	etcdserver: add snapshotStore and raftStorage snapshotStore is the store of snapshot, and it supports to get latest snapshot and save incoming snapshot. raftStorage supports to get latest snapshot when v3demo is open.	2015-10-01 19:00:59 -07:00
Xiang Li	1bcaa9f4a1	etcdserver: ignore confChangeUpdateNode in getIDs	2015-08-31 09:36:39 -07:00
Yicheng Qin	2d5b95c49f	etcdserver: use ReqTimeout only We cannot refer RTT value from heartbeat interval, so CommitTimeout is invalid. Remove it and use ReqTimeout instead.	2015-08-17 14:54:25 -07:00
Yicheng Qin	27170e67b9	etcdserver: specify timeout caused by leader election Before this PR, the timeout caused by leader election returns: ``` 14:45:37 etcd2 \| 2015-08-12 14:45:37.786349 E \| etcdhttp: got unexpected response error (etcdserver: request timed out) ``` After this PR: ``` 15:52:54 etcd1 \| 2015-08-12 15:52:54.389523 E \| etcdhttp: etcdserver: request timed out, possibly due to leader down ```	2015-08-12 16:53:18 -07:00
Yicheng Qin	5a91937367	etcdserver: adjust commit timeout based on config It uses heartbeat interval and election timeout to estimate the commit timeout for internal requests. This PR helps etcd survive under high roundtrip-time environment, e.g., globally-deployed cluster.	2015-08-11 21:09:03 -07:00
Brandon Philips	fb1951204c	etcdserver: move atomics to make etcd work on arm64 Follow the simple rule in the atomic package: "On both ARM and x86-32, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a global variable or in an allocated struct or slice can be relied upon to be 64-bit aligned." Tested on a system with /proc/cpuinfo reporting: processor : 0 model name : ARMv7 Processor rev 1 (v7l) Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc0d CPU revision : 1	2015-08-08 18:11:41 -07:00
Yicheng Qin	f03f048232	Merge pull request #3184 from yichengq/fast-bootstrap etcdserver: tick ElectionTicks before starting when bootstrap new cluster	2015-08-06 15:54:40 -07:00
Yicheng Qin	21f5b885f2	etcdserver: fast election timeout when bootstrap cluster The behavior accelarates the happen of the first-time leader election, so the cluster could elect its leader fast. Technically, it could help to reduce `electionMs - heartbeatMs` wait time for the first leader election. Main usage: 1. Quick start for the local cluster when setting a little longer election timeout 2. Quick start for the global cluster, which sets election timeout to its maximum 50s.	2015-08-06 15:44:26 -07:00

1 2

66 Commits (4dfd8ab2fce555e13d950aebb7e63df577e638dc)