vitalif/etcd - etcd

Commit Graph

Author	SHA1	Message	Date
Gyuho Lee	9149565cb3	*: move to "etcdserver/api/membership" Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-05-21 10:31:16 -07:00
Gyuho Lee	955fd99bc9	Merge pull request #9746 from gyuho/raft-logger etcdserver: set default Raft logger with zap.Logger	2018-05-18 16:32:48 -07:00
Gyuho Lee	58ae15bd29	etcdserver: set default Raft logger with zap.Logger Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-05-18 15:38:39 -07:00
Gyuho Lee	49d672ff9b	etcdserver: rename "SnapshotCount", add "SnapshotCatchUpEntries" Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-05-18 14:37:50 -07:00
Gyuho Lee	3ea7a5d0bd	etcdserver: add "LoggerCore" field for Raft logger Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-04-25 10:16:54 -07:00
Maciej Borsz	46bc966aa7	etcdserver: add is_leader prometheus metric that is 1 on the leader. Before this change, we had now way to find a leader using /metrics endpoint. This commit adds a metric to do that.	2018-04-19 11:47:40 +02:00
Gyuho Lee	d0847f4f25	*: clean up/fix server structured logs Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-04-18 12:54:43 -07:00
Gyuho Lee	bdbed26f64	etcdserver: support structured logging Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-04-16 17:36:00 -07:00
Gyuho Lee	041b9069a2	*: configure server logger - Add/Document "logger" to support structured logging. - This makes functional tests run easier, since zap logger provides built-in log redirect to files. - "etcd --logger-option=zap" to enable structured logging. - Current "capnslog" will still be used as "default". - We may switch the default or deprecate "capnslog" in v3.5. - Either way, will clearly be documented. Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-04-16 17:36:00 -07:00
Gyuho Lee	4f754c1850	etcdserver: clean up with "RaftStatusGetter" Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-03-15 19:30:08 -04:00
Gyuho Lee	9680b8a157	etcdserver: adjust election ticks on restart Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-03-10 19:09:38 -08:00
Gyuho Lee	edec229e10	etcdserver: make "advanceTicks" method Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-03-10 18:50:50 -08:00
Gyuho Lee	78918848bd	etcdserver: support Raft Pre-Vote Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-03-06 09:55:55 -08:00
Gyuho Lee	69357adf33	etcdserver: enable "CheckQuorum" when starting with "ForceNewCluster" We enable "raft.Config.CheckQuorum" by default in other Raft initial starts. So should start with "ForceNewCluster". Signed-off-by: Gyuho Lee <gyuhox@gmail.com>	2018-02-23 00:26:42 -08:00
dvonthenen	25cdf4ed92	*: expose Raft Applied Index through to "etcdctl endpoint status" Fixed based on feedback Fixed spacing Fix gofmt	2018-01-22 07:37:21 -08:00
Gyu-Ho Lee	75110dd839	*: fix naked returns Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-11-10 18:46:15 -08:00
Anthony Romano	dcf52bbfac	etcdserver, embed, integration: don't use pointer for ServerConfig ServerConfig is owned by etdcserver and unshared, so don't pass or store by pointer. Also removes duplicated field 'snapCount'.	2017-06-15 13:02:13 -07:00
fanmin shi	8b7b7222dd	etcdserver: renaming db happens after snapshot persists to wal and snap files In the case that follower recieves a snapshot from leader and crashes before renaming xxx.snap.db to db but after snapshot has persisted to .wal and .snap, restarting follower results loading old db, new .wal, and new .snap. This will causes a index mismatch between snap metadata index and consistent index from db. This pr forces an ordering where saving/renaming db must happen after snapshot is persisted to wal and snap file. this guarantees wal and snap files are newer than db. on server restart, etcd server checks if snap index > db consistent index. if yes, etcd server attempts to load xxx.snap.db where xxx=snap index if there is any and panic other wise. FIXES #7628	2017-05-09 14:00:12 -07:00
Gyu-Ho Lee	327f09fcb4	etcdserver: do not block on raft stopping Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 13:35:43 -07:00
Gyu-Ho Lee	91f6aee4f2	etcdserver: ensure waitForApply sync with applyAll Problem is: `Step1`: `etcdserver/raft.go`'s `Ready` process routine sends config-change entries via `r.applyc <- ap` (https://github.com/coreos/etcd/blob/master/etcdserver/raft.go#L193-L203) `Step2`: `etcdserver/server.go`'s `*EtcdServer.run` routine receives this via `ap := <-s.r.apply()` (https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L735-L738) `StepA`: `Step1` proceeds without sync, right after sending `r.applyc <- ap`. `StepB`: `Step2` proceeds without sync, right after `sched.Schedule(s.applyAll(&ep,&ap))`. `StepC`: `etcdserver` tries to sync with `s.applyAll(&ep,&ap)` by calling `rh.waitForApply()`. `rh.waitForApply()` waits for all pending jobs to finish in `pkg/schedule` side. However, the order of `StepA`,`StepB`,`StepC` is not guaranteed. It is possible that `StepC` happens first, and proceeds without waiting on apply. And the restarting member comes back as a leader in single-node cluster, when there is no synchronization between apply-layer and config-change Raft entry apply. Confirmed with more debugging lines below, only reproducible with slow CPU VM (~2 vCPU). ``` ~:24.005397 I \| etcdserver: starting server... [version: 3.2.0+git, cluster version: to_be_decided] ~:24.011136 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply before ~:24.011194 I \| etcdserver: [DEBUG] 29b2d24047a277df starts wait for 0 pending jobs ~:24.011234 I \| etcdserver: [DEBUG] 29b2d24047a277df finished wait for 0 pending jobs (current pending 0) ~:24.011268 I \| etcdserver: [DEBUG] 29b2d24047a277df waitForApply after ~:24.011348 I \| etcdserver: [DEBUG] [0] 29b2d24047a277df is scheduling conf change on 29b2d24047a277df ~:24.011396 I \| etcdserver: [DEBUG] [1] 29b2d24047a277df is scheduling conf change on 5edf80e32a334cf0 ~:24.011437 I \| etcdserver: [DEBUG] [2] 29b2d24047a277df is scheduling conf change on e32e31e76c8d2678 ~:24.011477 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 29b2d24047a277df ~:24.011509 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on 5edf80e32a334cf0 ~:24.011545 I \| etcdserver: [DEBUG] 29b2d24047a277df scheduled conf change on e32e31e76c8d2678 ~:24.012500 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df before ~:24.013014 I \| etcdserver/membership: added member 29b2d24047a277df [unix://127.0.0.1:2100515039] to cluster 9250d4ae34216949 ~:24.013066 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after ~:24.013113 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 29b2d24047a277df after trigger ~:24.013158 I \| etcdserver: [DEBUG] 29b2d24047a277df applyConfChange on 5edf80e32a334cf0 before ~:24.013666 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 11.964739ms) ~:24.013709 W \| etcdserver: server is likely overloaded ~:24.013750 W \| etcdserver: failed to send out heartbeat on time (exceeded the 10ms timeout for 12.057265ms) ~:24.013775 W \| etcdserver: server is likely overloaded ~:24.013950 I \| raft: 29b2d24047a277df is starting a new election at term 4 ~:24.014012 I \| raft: 29b2d24047a277df became candidate at term 5 ~:24.014051 I \| raft: 29b2d24047a277df received MsgVoteResp from 29b2d24047a277df at term 5 ~:24.014107 I \| raft: 29b2d24047a277df became leader at term 5 ~:24.014146 I \| raft: raft.node: 29b2d24047a277df elected leader 29b2d24047a277df at term 5 ``` I am printing out the number of pending jobs before we call `sched.WaitFinish(0)`, and there was no pending jobs, so it returned immediately (before we schedule `applyAll`). This is the root cause to: - https://github.com/coreos/etcd/issues/7595 - https://github.com/coreos/etcd/issues/7739 - https://github.com/coreos/etcd/issues/7802 `sched.WaitFinish(0)` doesn't work when `len(f.pendings)==0` and `f.finished==0`. Config-change is the first job to apply, so `f.finished` is 0 in this case. `f.finished` monotonically increases, so we need `WaitFinish(finished+1)`. And `finished` must be the one before calling `Schedule`. This is safe because `Schedule(applyAll)` is the only place adding jobs to `sched`. Then scheduler waits on the single job of `applyAll`, by getting the current number of finished jobs before sending `Schedule`. Or just make it be blocked until `applyAll` routine triggers on the config-change job. This patch just removes `waitForApply`, and signal `raftDone` to wait until `applyAll` finishes applying entries. Confirmed that it fixes the issue, as below: ``` ~:43.198354 I \| rafthttp: started streaming with peer 36cda5222aba364b (stream MsgApp v2 reader) ~:43.198740 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply before ~:43.198836 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c starts wait for 0 pending jobs, 1 finished jobs ~:43.200696 I \| integration: launched 3169361310155633349 () ~:43.201784 I \| etcdserver: [DEBUG] [0] 3988bc20c2b2e40c is scheduling conf change on 36cda5222aba364b ~:43.201884 I \| etcdserver: [DEBUG] [1] 3988bc20c2b2e40c is scheduling conf change on 3988bc20c2b2e40c ~:43.201965 I \| etcdserver: [DEBUG] [2] 3988bc20c2b2e40c is scheduling conf change on cf5d6cbc2a121727 ~:43.202070 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 36cda5222aba364b ~:43.202139 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on 3988bc20c2b2e40c ~:43.202204 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c scheduled conf change on cf5d6cbc2a121727 ~:43.202444 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) before ~:43.204486 I \| etcdserver/membership: added member 36cda5222aba364b [unix://127.0.0.1:2100913646] to cluster 425d73f1b7b01674 ~:43.204588 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after ~:43.204703 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 36cda5222aba364b (request ID: 0) after trigger ~:43.204791 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) before ~:43.205689 I \| etcdserver/membership: added member 3988bc20c2b2e40c [unix://127.0.0.1:2101113646] to cluster 425d73f1b7b01674 ~:43.205783 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after ~:43.205929 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on 3988bc20c2b2e40c (request ID: 0) after trigger ~:43.206056 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) before ~:43.207353 I \| etcdserver/membership: added member cf5d6cbc2a121727 [unix://127.0.0.1:2100713646] to cluster 425d73f1b7b01674 ~:43.207516 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after ~:43.207619 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c applyConfChange on cf5d6cbc2a121727 (request ID: 0) after trigger ~:43.207710 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 36cda5222aba364b ~:43.207781 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on 3988bc20c2b2e40c ~:43.207843 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished scheduled conf change on cf5d6cbc2a121727 ~:43.207951 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c finished wait for 0 pending jobs (current pending 0, finished 1) ~:43.208029 I \| rafthttp: started HTTP pipelining with peer cf5d6cbc2a121727 ~:43.210339 I \| rafthttp: peer 3988bc20c2b2e40c became active ~:43.210435 I \| rafthttp: established a TCP streaming connection with peer 3988bc20c2b2e40c (stream MsgApp v2 reader) ~:43.210861 I \| rafthttp: started streaming with peer 3988bc20c2b2e40c (writer) ~:43.211732 I \| etcdserver: [DEBUG] 3988bc20c2b2e40c waitForApply after ``` Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-25 10:22:27 -07:00
Anthony Romano	714b48a4b4	etcdserver: initialize raftNode with constructor raftNode was being initialized in start(), which was causing hangs when trying to stop the etcd server since the stop channel would not be initialized in time for the stop call. Instead, setup non-configurable bits in a constructor. Fixes #7668	2017-04-18 09:33:59 -07:00
Gyu-Ho Lee	04354f32ab	etcdserver: wait apply on conf change Raft entry When apply-layer sees configuration change entry in raft.Ready.CommittedEntries, the server should not proceed until that entry is applied. Otherwise, follower's raft layer advances, possibly election-timeouts, and becomes the leader in single-node cluster, before add-node conf change of other nodes is applied. Fix https://github.com/coreos/etcd/issues/7595. Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-04-13 15:59:24 -07:00
Xiang	7f0733cf46	etcdserver: candidate should wait for applying all configuration changes	2017-03-14 17:20:20 -07:00
Gyu-Ho Lee	3d75395875	*: remove never-unused vars, minor lint fix Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>	2017-03-06 14:59:12 -08:00
fanmin shi	2a1bae0c2a	etcdserver: consistent naming in raftReadyHandler	2016-12-29 11:27:16 -08:00
fanmin shi	2faf72f47c	etcdserver: rework update committed index logic	2016-12-27 10:11:40 -08:00
Gyu-Ho Lee	3fd1d951f8	etcdserver: time out when readStateC is blocking Otherwise, it will block forever when the server is overloaded. Fix https://github.com/coreos/etcd/issues/6891.	2016-12-05 15:34:46 -08:00
Gyu-Ho Lee	6ec03d3f7c	etcdserver: move 'EtcdServer.send' to raft.go Clear 'TODO'	2016-10-26 16:26:00 -07:00
Gyu-Ho Lee	e011ea25ca	etcdserver: separate EtcdServer from raftNode	2016-10-07 13:18:39 -07:00
Xiang Li	0f0c048e29	etcdserver: fix early lessor promotion issue If we promote the lessor before finish applying all entries from the last term, we might incorrectly renew the already revoked leases. Here is an example: - Term 1: revoke lease A accepted by raft - Old leader failed, new election happened - Term 2: promote - Term 2: keep alive A succeed. A now has 10 seconds TTL - Term 2: revoke lease A from Term 1 got committed and applied - Term 2: the lease A with 10 seconds TTL is revoked To solve this, the new leader MUST apply all entries from old term before promote its lessor to start accept renew requests.	2016-10-05 14:41:47 -07:00
Xiang Li	e3e3993022	etcdserver: support read index Use read index to achieve l-read.	2016-09-27 13:41:40 +08:00
Anthony Romano	de68818f03	etcdserver: add some failpoints	2016-06-21 14:43:20 -07:00
Xiang Li	9c78cda088	etcdserver: save state before save snapshot	2016-06-15 22:00:33 -07:00
Gyu-Ho Lee	32d766d749	etcdserver: preallocate slice	2016-06-15 13:03:10 -07:00
Gyu-Ho Lee	abb4cd5646	etcdserver: update LICENSE header	2016-05-12 20:49:40 -07:00
Xiang Li	9c103dd0de	*: cancel required leader streams when memeber lost its leader	2016-05-12 19:42:21 -07:00
Anthony Romano	dcb3b7aecf	*: scrub legacy ports from code and scripts	2016-05-11 13:46:30 -07:00
Xiang Li	ab11415d25	*: add proposalsCommitted metrics	2016-05-10 10:56:25 -07:00
Xiang Li	824478be5f	*: add has leader metrics	2016-05-06 13:59:19 -07:00
Xiang Li	76d073a2b5	*: add leader changes to metrics	2016-05-06 13:12:20 -07:00
Xiang Li	3c0ac9d600	etcdserver: set backend to cluster	2016-04-08 21:46:45 -07:00
Xiang Li	bf2289ae00	etcdserver: move membership related code to membership pkg	2016-04-07 14:21:37 -07:00
Xiang Li	70a9391378	*: enable v3 by default	2016-03-23 17:01:36 -07:00
Anthony Romano	bd832e5b0a	*: migrate Godeps to vendor/	2016-03-22 17:10:28 -07:00
Xiang Li	2a28ac7ad4	etcdserver: leader should stepdown when lose quorum for v3	2016-03-15 23:23:26 -07:00
Xiang Li	e9a0a103e5	*: refresh the lease TTL correctly when a leader is elected. The new leader needs to refresh with an extened TTL to gracefully handle the potential concurrent leader issue. Clients might still send keep alive to old leader until the old leader itself gives up leadership at most after an election timeout.	2016-03-15 22:40:03 -07:00
Xiang Li	0f9d04237c	etcdserver: leader latency optimization	2016-03-12 22:51:13 -08:00
Xiang Li	d6520303c6	etcdserver: detect raft starvation caused by contention	2016-02-29 17:06:57 -08:00
Xiang Li	d265fe000c	*: support time based auto compaction. Fix https://github.com/coreos/etcd/issues/3906. We will have extensive doc to talk about what is compaction and what is auto compaction soon.	2016-02-25 16:02:03 -08:00
Anthony Romano	20461ab11a	*: fix many typos	2016-01-31 21:42:39 -08:00

1 2

80 Commits (2dd555c9834984116b6ce9c2a21224f0327cbd8c)