Commit Graph

108 Commits (805959833200bbb769d9b02ef443d478a48da766)

Author SHA1 Message Date
Rob Strong 494d2c67aa fix(peer_server) set content type to application/json in admin 2014-06-21 13:13:10 -04:00
Yicheng Qin 25e69d9659 fix(multi_node_kill_all_and_recovery_test): ensure cluster is up 2014-06-02 14:43:51 -07:00
Yicheng Qin e04a188358 fix(remove_node_test): remove unnecessary cluster configuration
The cluster configuration operation is originally to make sure
the instance won't be added back automatically between removal and
check for the number of existing peer-mode instances. But this
could make some node removed before the removal command.

Use longer sync interval instead to avoid this problem.
2014-06-02 13:30:19 -07:00
Yicheng Qin 7cb126967c fix(simple_snapshot_test): enlarge reasonable index range 2014-05-31 10:42:31 -07:00
Yicheng Qin 444e017c05 fix(remove_node_test): ensure cluster config is activated 2014-05-31 10:32:03 -07:00
Yicheng Qin 356675b70f fix(multi_node_kill_all_and_recovery_test): ensure cluster running 2014-05-31 10:15:03 -07:00
Yicheng Qin 37796ed84c tests: add TestMultiNodeKillAllAndRecorveryAndRemoveLeader
This one breaks because it doesn't set joinIndex correctly.
2014-05-31 10:01:45 -07:00
Yicheng Qin ca29691543 tests(standby_test): comments 2014-05-30 18:36:23 -07:00
Yicheng Qin 4bebb538eb fix(standby_server): able to join the cluster containing itself
Standby server will switch to peer server if it finds that
it has been contained in the cluster.
2014-05-30 14:03:49 -07:00
Xiang Li aaedf32c04 fix(test/remove_node_test.go) fix a deadlock in the test
The go-etcd client waits for the response from the paused node. And the test waits for the reponse to continue.
Actually we do not even need that small test, since we will check the machine status afterwards.
2014-05-20 14:34:59 -07:00
Yicheng Qin 9e5b12f591 tests(remove_node): add TestRemovePausedNode 2014-05-20 11:01:14 -07:00
Yicheng Qin 71679bcf56 feat(standby_server): make atomic move for file
to avoid the risk of writing out a corrupted file.
2014-05-16 01:00:07 -04:00
Yicheng Qin b7d9fdbd39 feat(standby_server): write cluster info to disk
For better fault tolerance and availability.
2014-05-15 07:47:15 -04:00
Yicheng Qin fc77b3e9e6 fix(simple_snapshot_test): enlarge reasonable index range 2014-05-13 22:28:28 -04:00
Yicheng Qin 403f709ebd chore(cluster_config): set default timeout to 5s
Or the leader death could let the standbys down for a rather long time.
2014-05-13 16:13:44 -04:00
Yicheng Qin 5367c1c998 chore(standby): minor changes based on comments 2014-05-09 15:38:03 -07:00
Yicheng Qin 6d4f018887 chore(cluster_config): rename SyncClusterInterval to SyncInterval
for better naming
2014-05-09 13:28:21 -07:00
Yicheng Qin baadf63912 feat: implement standby mode
Change log:
1. PeerServer
- estimate initial mode from its log through removedInLog variable
- refactor FindCluster to return the estimation
- refactor Start to call FindCluster explicitly
- move raftServer start and cluster init from FindCluster to Start
- remove stopNotify from PeerServer because it is not used anymore
2. Etcd
- refactor Run logic to fit the specification
3. ClusterConfig
- rename promoteDelay to removeDelay for better naming
- add SyncClusterInterval field to ClusterConfig
- commit command to set default cluster config when cluster is created
- store cluster config info into key space for consistency
- reload cluster config when reboot
4. add StandbyServer
5. Error
- remove unused EcodePromoteError
2014-05-09 01:56:55 -07:00
Yicheng Qin 04f09d2fd0 feat(peer_server): add State field to machineMessage
State field indicates the state of each machine.
For now, its value could be follower or leader.
2014-05-08 10:25:39 -07:00
Xiang Li b56aa62bcc Merge pull request #773 from unihorn/82
tests(snapshot): expand reasonable range for index
2014-05-07 13:02:41 -04:00
Yicheng Qin c4cd86e094 tests(snapshot): expand reasonable range for index
snapshot file was createed with name '0_503.ss' and '0_1010.ss' when testing.
2014-05-07 09:41:36 -07:00
Yicheng Qin 17e299995c refactor(peer_server): remove standby mode in peer server 2014-05-07 09:10:09 -07:00
Yicheng Qin ece25833aa Merge pull request #738 from unihorn/68
feat(peer_server): forbid rejoining with different name
2014-04-18 11:49:36 -07:00
Yicheng Qin 000e3ba651 chore(rejoin_test): rewrite some printout 2014-04-18 10:48:14 -07:00
Yicheng Qin b742af56ec Merge pull request #723 from unihorn/63
tests: add TestJoinThroughFollower
2014-04-17 19:44:46 -07:00
Yicheng Qin b17703a9e4 chore(tests/join): adjust output 2014-04-17 19:28:58 -07:00
Yicheng Qin 0c95e1eabb feat(peer_server): forbid rejoining with different name
Or it will confuse the cluster, especially the heartbeat between nodes.
2014-04-17 15:46:33 -07:00
Yicheng Qin 732fb7c160 tests(rejoin): add TestReplaceWithDifferentPeerAddress
The functionality has not been implemented yet.
2014-04-17 10:17:26 -07:00
Yicheng Qin 273c293645 fix(server): rejoin cluster with different ip 2014-04-17 10:16:30 -07:00
Yicheng Qin 67600603c5 chore: rename proxy mode to standby mode
It makes the name more reasonable.
2014-04-17 08:04:42 -07:00
Yicheng Qin adf4acf947 chore: gofmt go files 2014-04-15 09:42:25 -07:00
Yicheng Qin d88b52c5f3 fix(tests/v1_migration): correct HTTP response
The bug is introduced in 03839ca8 due to the mistake.
2014-04-15 09:25:14 -07:00
Yicheng Qin 8bcfb2ecaf Merge pull request #707 from unihorn/62
fix(peer_server): recover from outage with discovery
2014-04-14 13:58:43 -07:00
Yicheng Qin 03839ca806 fix(peer_server): recover from outage with discovery
This patch also contains the refactor of find cluster process.
It is changed based on @xiangli-cmu 's commits in 627 issue.
2014-04-14 13:56:47 -07:00
Yicheng Qin de9c318436 tests: add TestJoinThroughFollower 2014-04-14 13:41:45 -07:00
Xiang Li bc70cdc242 tests(snapshot_test) loose the timing assumption for snapshot test
Test run slowly on drone after open race option.
2014-04-11 19:49:57 -04:00
Xiang Li fc84da29e8 fix(internal_version_test.go) protect the checkedVersion by a lock 2014-04-10 23:35:55 -04:00
Xiang Li 2817baf3f8 fix(discovery_test.go) protect the garbageHandler by a lock 2014-04-10 23:28:40 -04:00
Yicheng Qin 1c40f327be test(snapshot): test restart with snapshot 2014-04-04 16:45:02 -07:00
Yicheng Qin 0a4b6570e1 chore(tests): start TLS cluster slowly to evade problem 2014-04-04 10:57:11 -07:00
Xiang Li 5b4a473f14 Merge pull request #667 from unihorn/53
chore(tests): test TestTLSMultiNodeKillAllAndRecovery now
2014-03-27 22:40:09 -04:00
Yicheng Qin 4e747d24dd chore(tests): test TestTLSMultiNodeKillAllAndRecovery now
It is fixed.
2014-03-27 17:06:18 -07:00
Tomás Senart b6053d6a86 Making code formatting consistent.
$ gofmt -s -w  && goimports -w
2014-03-27 14:19:08 +01:00
Blake Mizerany 4bce3e4810 test(tests/functional): skip TestTLSMultiNodeKillAllAndRecovery until fixed 2014-03-26 19:22:59 -07:00
Ben Johnson 62b89a128a Merge branch 'master' of https://github.com/coreos/etcd into proxy
Conflicts:
	config/config.go
	server/peer_server.go
	server/transporter.go
	tests/server_utils.go
2014-03-24 15:30:14 -07:00
Ben Johnson 174b9ff343 bump(github.com/goraft/raft): 6bf34b9
Move from coreos/raft to goraft/raft and update to latest.
2014-03-24 15:09:47 -07:00
Ben Johnson 7d4fda550d Machine join/remove v2 API. 2014-03-18 16:25:21 -06:00
Yicheng Qin 50d9e6a7fd chore(fixtures/ca): make all certificates generated by etcd-ca 2014-03-17 12:32:55 -07:00
Ben Johnson c0a59b3a27 Add minimum active size and promote delay. 2014-03-10 14:44:04 -06:00
Ben Johnson 3fff1a8dcd Add /machines and /machines/:name endpoints. 2014-03-06 15:11:31 -07:00