Commit Graph

2020 Commits (bc9e433ca26be0314a37c8c6ead66429a584bbc9)

Author SHA1 Message Date
yoyinzyc 073bc22d35 etcdserver: add downgrade api in maintenance server. 2020-03-22 22:35:08 -07:00
yoyinzyc d8b9b54348 etcdserver: add downgrade rpc proto api. 2020-03-20 17:37:26 -07:00
Gyuho Lee 92f180c574 *: log server-side /health checks
To make it easier to root-cause when /health check fails.
For example, we are using load balancer to health check
each etcd instance, and when one etcd node gets terminated,
it's hard to tell whether etcd "server" was really failing
or client (or load balancer") failed to reach the etcd cluster
which is also failure in load balancer health check.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-03-18 11:14:05 -07:00
Gyuho Lee 33907477dd *: add "etcd_server_client_requests_total", tests
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-03-17 12:16:11 -07:00
Gyuho Lee 58ba322bb4 clientv3: embed API version
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-03-17 12:16:11 -07:00
Grigorii Sokolik a33e1b5fae etcdserver/api/etcdhttp: checkHealth refactoring
Small refactoring of
`go.etcd.io/etcd/etcdserver/api/etcdhttp/metrics.go.checkHealth`
function just to avoid anoying repeatings of `if h.Health == "true" {`
2020-03-17 15:37:11 +02:00
Jingyi Hu 193e60ebd9
Merge pull request #11670 from tangcong/optimize-auth-store-log
*: optimize auth/etcdserver logs to facilitate troubleshooting data inconsistency
2020-03-16 00:15:14 +08:00
tangcong 5a17367923 etcdserver: print warn log when failed to apply request 2020-03-10 23:13:53 +08:00
shawwang 15eeb2c4ae etcdserver: add auth revision to AuthStatus to improve observability and testability 2020-03-04 22:37:24 +08:00
tangcong 06ad53321e *: fix auth revision corruption bug 2020-02-29 13:31:37 +08:00
Rafael Fernández López 6991f619f2
etcdserver: fix quorum calculation when promoting a learner member
When promoting a learner member we should not count already a voting
member, but take only into account the number of existing voting
members and their current status (started, unstarted) when taking the
decision whether a learner member can be promoted.

Before this change, it was impossible to grow from a quorum N to a N+1
through promoting a learning member.

Fixes: #11633
2020-02-20 12:06:35 +01:00
Jingyi Hu d6a3c995cf
Merge pull request #11621 from jingyih/corruption_check_tls
etcdserver: make corruption check work under peer mTLS
2020-02-18 18:55:13 +08:00
Ted Yu 7e0e6bf497 mvcc/backend: remove db.tmp regardless of logger presence 2020-02-14 12:10:24 -08:00
jingyih c031b27491 etcdserver: corruption check via http
During corruption check, get peer's hashKV via http call.
2020-02-14 06:03:15 -08:00
Joe Betz 213f7f7877
mvcc/backend: Delete orphaned db.tmp files before defrag 2020-02-12 22:40:28 -08:00
Jingyi Hu 61f279454e
etcdserver/api: remove capnslog (#11606)
* etcdserver/api/rafthttp: remove capnslog

* etcdserver/api/membership: remove capnslog

* etcdserver/api/v2auth: remove capnslog

* etcdserver/api/v2discovery: remove capnslog

* etdserver/api/v2stats: remove capnslog

* etcdserver/api/v2http: remove capnslog

* etcdserver/api/v3rpc: remove capnslog

* etcdserver/api: remove capnslog

Remove capnslog from etcdserver/api. Note that capnslog was
already removed in some packages under etcdserver/api in
previous commits.
2020-02-11 13:51:25 -08:00
Jingyi Hu c94782cd55
etcdserver: remove capnslog (#11611)
remove capnslog from etcdserver pkg, except etcdserver/api.
2020-02-11 08:54:14 -08:00
Hitoshi Mitake fda8d38bd4 etcdserver: mark AuthStatus as no side effect request 2020-02-11 23:26:50 +09:00
Sahdev Zala 4c25efc1f8
Discovery: do not allow passing negative cluster size (#11608)
When an etcd instance attempts to perform service discovery, if a
cluster size with negative value  is provided, the etcd instance
will panic without recovery because of
2020-02-10 10:43:41 -08:00
Daniel Lipovetsky bc4adb8b5c
etcdserver: populate ResponseHeader in Alarm method (#11600)
When no Alarms are found, the response has no header. The header should always
be populated. Some components, like fields printer used by etcdctl, break when
the header is not populated.

Fixes #11581

Signed-off-by: Daniel Lipovetsky <dlipovetsky@d2iq.com>
2020-02-08 23:07:58 -08:00
Vern Burton 071e70cdc4
*: add a new API and command for checking auth status (#11536)
This changes have started at etcdctl under auth.go, and make changes to stub out everything down into the internal raft.  Made changes to the .proto files and regenerated them so that the local version would build successfully.
2020-02-05 19:27:42 -08:00
Gyuho Lee a924600700
Merge pull request #11590 from alrs/fix-dropped-test-error
etcdserver/api/v2v3: fix dropped test error
2020-02-05 08:41:07 -08:00
Lars Lehtonen f2c3bcd086
etcdserver/api/v2v3: fix dropped test error
etcdserver/api/v2v3: use testing.T instead of log in tests
2020-02-05 07:20:12 -08:00
Jingyi Hu 7395ed8e5d
Merge pull request #11578 from jingyih/set_zap_as_default_logger
*: set zap as default logger, remove capnslog
2020-02-04 22:58:45 -08:00
jingyih 725e09023a *: set zap as default logger, remove capnslog
Set zap as default logger. Remove capnslog and deprecated logging
flags.
2020-02-04 04:57:49 -08:00
sfzhu93 cad92706cf
in multiple packages: fixed goroutine leak bugs in tests (#11569) 2020-01-30 10:45:59 -08:00
yoyinzyc 7784ca8997 etcdserver: remove v2 version set; e2e: fix tests. 2019-12-09 13:08:00 -08:00
Jingyi Hu ed5a01a48d etcdserver: recover cluster version from backend 2019-12-05 16:25:13 -08:00
Jingyi Hu 5cd2502ab1 etcdserver: use v3 to publish member attr 2019-12-05 16:25:13 -08:00
yoyinzyc 0c3401fa76 etcdserver: use V3 to update cluster version 2019-12-05 16:25:13 -08:00
Jingyi Hu dcd622b2c7 etcdserver: add v3 request type for cluster attr
Added ClusterVersionSetRequest for setting cluster version via v3 apply.

Added ClusterMemberAttrSetRequest for setting clsuter member attributes
via v3 apply.
2019-12-05 16:25:13 -08:00
jianli cdc28507ef etcdserver: fix append object to a new allocated sized slice 2019-11-01 10:19:45 +08:00
Jingyi Hu c447955d93 etcdserver: wait purge file loop during shutdown
To prevent the purge file loop from accidentally acquiring the file lock
and remove the files during server shutdowm.
2019-10-30 14:21:08 -07:00
Hitoshi Mitake 84e2788c2e
Merge pull request #10468 from jingyih/remove_auth_loop
etcdserver: remove infinite loop for auth in raftRequest
2019-10-29 00:11:40 +09:00
yoyinzyc 80a177292e rafthttp: add test stream support for current version. 2019-10-21 09:45:00 -07:00
Jingyi Hu 5dc12f2725
Merge pull request #11274 from YoyinZyc/fix-upgrade-failure
rafthttp: add 3.4.0,3.5.0 stream type
2019-10-20 19:20:06 -07:00
yoyinzyc a0e528e4b1 rafthttp: add 3.4.0,3.5.0 stream type 2019-10-17 14:25:56 -07:00
Jingyi Hu 444bfdff59 etcdserver: strip patch version in metrics
Strip patch version in cluster version metrics during node restart.
2019-10-16 12:39:17 -07:00
Jingyi Hu 1333abc606 etcdserver: strip patch version in cluster version
Strip patch version in cluster version metrics.
2019-10-14 16:59:09 -07:00
Jingyi Hu 9c4194f6ef etcdserver: unset old cluster version in metrics 2019-10-11 22:25:03 -07:00
Gyuho Lee 340f0ac797
Merge pull request #11179 from YoyinZyc/trace
Add tracing to range request in etcd server.
2019-10-08 13:23:53 -07:00
yoyinzyc 57aa68af5a etcdserver: trace compaction request; add return parameter 'trace' to applierV3.Compaction()
mvcc: trace compaction request; add input parameter 'trace' to KV.Compact()
2019-10-07 09:55:27 -07:00
yoyinzyc 3a3eb24c69 etcdserver: trace raft requests. 2019-10-01 15:38:52 -07:00
yoyinzyc 401df4bb8e etcdserver: add put request steps.
mvcc: add put request steps; add trace to KV.Write() as input parameter.
2019-10-01 14:08:06 -07:00
yoyinzyc 1d6ef8370e pkg: use zap logger to format the structure log output. 2019-09-30 13:11:21 -07:00
yoyinzyc 3830b3ef11 pkg: add field to record additional detail of trace; add stepThreshold
to reduce log volume.
2019-09-30 13:11:21 -07:00
yoyinzyc f4e7fc56a7 pkg: create package traceutil for tracing. mvcc: add tracing
steps:range from the in-memory index tree; range from boltdb.
etcdserver: add tracing steps: agreement among raft nodes before
linerized reading; authentication; filter and sort kv pairs; assemble
the response.
2019-09-30 13:06:02 -07:00
Xiang Li 589ab747f7
Merge pull request #11014 from dbavatar/peervalidation
etcdserver: Fix PeerURL validation
2019-09-13 17:42:39 -07:00
Debabrata Banerjee 0dd10cf6b8 etcdserver: Fix PeerURL validation
In case of URLs that are synonyms, the current lexicographic sorting
and compare of the URLs fails with frustrating errors. Make sure to do
a full comparison between every set of PeerURLs before failing.

Fixes #11013
2019-09-13 17:53:40 -04:00
zhangjianweibj 81a34ab6d5 etcdserver: remove dup percentage sign in log 2019-09-05 11:24:39 +08:00
Joe Betz 9b51febaf5 *: Add experimental-compaction-batch-limit flag 2019-08-15 11:47:23 -07:00
Gyuho Lee 06b82c200f etcdserver: add "etcd_server_snapshot_apply_inflights_total"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 13:33:52 -07:00
Gyuho Lee 46bddacacb etcdserver/api: add "etcd_network_snapshot_send_inflights_total", "etcd_network_snapshot_receive_inflights_total"
Useful for deciding when to terminate the unhealthy follower.
If the follower is receiving a leader snapshot, operator may wait.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 13:33:48 -07:00
Zeming YU 3edb569ad3
v3rpc: fix a typo `err`
don't read return value in child goroutine which causes data race.
2019-08-06 14:04:58 -07:00
Gyuho Lee 7fbbb9c8bf *: add 3.5 capability for 3.5 dev tree
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-02 15:27:54 -07:00
Gyuho Lee 3658571e3a etcdserver/api: enable 3.4 capability
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-02 15:24:18 -07:00
Gyuho Lee 6a0811a949 *: use new adt.IntervalTree interface
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-31 22:23:13 -07:00
Gyuho Lee c6e3401255 etcdserver: make raft log configured by top level logger
To make it consistent

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-29 15:43:19 -07:00
Gyuho Lee 2f30e9ad7f etcdserver: document v2 usage in "publish" method
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-28 21:07:39 -05:00
Gyuho Lee 7cbe2f5dd6 etcdserver/api/v3rpc: use new "credentials" package
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-26 08:56:38 -07:00
Gyuho Lee 50babc16e7 etcdserver/api/v2v3: skip tests for CI
To fix in v3.5

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-26 05:54:58 -07:00
Gyuho Lee 388d15f521
Merge pull request #10622 from philips/add-v2v3-tests
etcdserver: api/v2v3: add initial tests
2019-07-25 10:05:29 -07:00
Sahdev P. Zala 1cef112a79 etcdserver: do not allow creating empty role
Like user, we should not allow creating empty role.

Related #10905
2019-07-24 17:41:24 -04:00
Tobias Schottdorf b9c051e7a7 raftpb: clean up naming in ConfChange 2019-07-23 10:40:03 +02:00
Tobias Schottdorf b67303c6a2 raft: allow use of joint quorums
This change introduces joint quorums by changing the Node and RawNode
API to accept pb.ConfChangeV2 (on top of pb.ConfChange).

pb.ConfChange continues to work as today: it allows carrying out a
single configuration change. A pb.ConfChange proposal gets added to
the Raft log as such and is thus also observed by the app during Ready
handling, and fed back to ApplyConfChange.

ConfChangeV2 allows joint configuration changes but will continue to
carry out configuration changes in "one phase" (i.e. without ever
entering a joint config) when this is possible.
2019-07-23 10:40:03 +02:00
Tobias Schottdorf eb4d9b640a etcdserver: fix createConfChangeEnts
It created a sequence of conf changes that could intermittently cause an
empty set of voters, which Raft asserts against as of #10889.

This fixes TestCtlV2BackupSnapshot and TestCtlV2BackupV3Snapshot, see:
https://github.com/etcd-io/etcd/issues/10700#issuecomment-512358126
2019-07-19 17:13:08 +02:00
Tobias Schottdorf c9491d7861 raft: clean up bootstrap
This is the first (maybe not last) step in cleaning up the bootstrap
code around StartNode.

Initializing a Raft group for the first time is awkward, since a
configuration has to be pulled from thin air. The way this is solved
today is unclean: The app is supposed to pass peers to StartNode(),
we add configuration changes for them to the log, immediately pretend
that they are applied, but actually leave them unapplied (to give the
app a chance to observe them, though if the app did decide to not apply
them things would really go off the rails), and then return control to
the app. The app will then process the initial Readys and as a result
the configuration will be persisted to disk; restarts of the node then
use RestartNode which doesn't take any peers.

The code that did this lived awkwardly in two places fairly deep down
the callstack, though it was really only necessary in StartNode(). This
commit refactors things to make this more obvious: only StartNode does
this dance now. In particular, RawNode does not support this at all any
more; it expects the app to set up its Storage correctly.

Future work may provide helpers to make this "preseeding" of the Storage
more user-friendly. It isn't entirely straightforward to do so since
the Storage interface doesn't provide the right accessors for this
purpose. Briefly speaking, we want to make sure that a non-bootstrapped
node can never catch up via the log so that we can implicitly use one
of the "skipped" log entries to represent the configuration change into
the bootstrap configuration. This is an invasive change that affects
all consumers of raft, and it is of lower urgency since the code (post
this commit) already encapsulates the complexity sufficiently.
2019-07-19 10:02:02 +02:00
lzhfromustc d35f6647bc
Use newbe instead of s.be to avoid potential race
`s.cluster.SetBackend(s.be)` is not in critical section. Using `newbe` instead of `s.be` can avoid potential data race.
2019-07-08 14:24:52 -07:00
Tobias Schottdorf f9c2d00fb3 raft: extract 'tracker' package
Mechanically extract `progressTracker`, `Progress`, and `inflights`
to their own package named `tracker`. Add lots of comments in the
progress, and take the opportunity to rename and clarify various
fields.
2019-06-21 22:15:00 +02:00
Nima Yahyazadeh b1812a410f Raft HTTP: fix pause/resume race condition 2019-06-17 11:45:25 -04:00
Jingyi Hu 48d144a3de
Merge pull request #10731 from WIZARD-CXY/learner_metric
etcdserver: add learner metrics
2019-06-08 22:43:03 -07:00
Xiang Li 9a73013004
Merge pull request #10797 from jingyih/lease_checkpoint_enabled_by_experimental_flag
*: enable lease checkpoint via experimental flag
2019-06-05 22:56:54 -07:00
Jingyi Hu e67b9829b6 *: enable lease checkpoint via experimental flag
Primary lessor persist lease remainingTTL only if experimental flag
"--experimental-enable-lease-checkpoint" is set.
2019-06-05 15:30:03 -07:00
Gyuho Lee 1caaa9ed4a test: test update for Go 1.12.5 and related changes
Update to Go 1.12.5 testing. Remove deprecated unused and gosimple
pacakges, and mask staticcheck 1006. Also, fix unconvert errors related
to unnecessary type conversions and following staticcheck errors:
- remove redundant return statements
- use for range instead of for select
- use time.Since instead of time.Now().Sub
- omit comparison to bool constant
- replace T.Fatal and T.Fatalf in tests with T.Error and T.Fatalf respectively because the goroutine calls T.Fatal must be called in the same goroutine as the test
- fix error strings that should not be capitalized
- use sort.Strings(...) instead of sort.Sort(sort.StringSlice(...))
- use he status code of Canceled instead of grpc.ErrClientConnClosing which is deprecated
- use use status.Errorf instead of grpc.Errorf which is deprecated

Related #10528 #10438
2019-06-05 17:02:05 -04:00
宇慕 0b8727b3f3 etcdserver: add learner metrics 2019-06-05 10:51:21 +08:00
Gyuho Lee cdca488d8b
Merge pull request #9817 from mitake/no-password
*: support creating a user without password
2019-05-31 22:41:29 -07:00
Xiang Li d8e2e47de5
Merge pull request #10693 from nolouch/fix-lease
lease/lessor: recheck if exprired lease is revoked
2019-05-31 12:13:36 -07:00
Hitoshi Mitake 5a67dd788d *: support creating a user without password
This commit adds a feature for creating a user without password. The
purpose of the feature is reducing attack surface by configuring bad
passwords (CN based auth will be allowed for the user).

The feature can be used with `--no-password` of `etcdctl user add`
command.

Fix https://github.com/coreos/etcd/issues/9590
2019-05-30 21:59:30 +09:00
nolouch dc8a31eaf0 lease/lessor: recheck if exprired lease is revoked
Signed-off-by: nolouch <nolouch@gmail.com>
2019-05-29 14:27:19 +08:00
Jingyi Hu 23511d21ec *: address comments 2019-05-28 18:50:13 -07:00
Jingyi Hu 6bf609b96d integration: update TestMemberPromote test
Update TestMemberPromote to include both learner not-ready and learner
ready test cases.

Removed unit test TestPromoteMember, it requires underlying raft node to
be started and running. The member promote is covered by the integration
test.
2019-05-28 18:50:13 -07:00
宇慕 3f94385fc6 etcdserver: update raftStatus 2019-05-28 18:50:13 -07:00
Jingyi Hu e994a4df01 etcdserver: check http StatusCode before unmarshal
Check http StatusCode. Only Unmarshal body if StatusCode is statusOK.
2019-05-28 18:50:13 -07:00
Jingyi Hu f8ad8ae4ad etcdserver: use etcdserver ErrLearnerNotReady
If learner is not ready to be promoted, use etcdserver.ErrLearnerNotReady
instead of using membership.ErrLearnerNotReady.
2019-05-28 18:50:13 -07:00
Jingyi Hu f5eaaaf440 etcdserver: forward member promote to leader 2019-05-28 18:50:10 -07:00
宇慕 dfe296ac3c etcdserver: add mayPromote check 2019-05-28 18:47:03 -07:00
Jingyi Hu aa4cda2f5c etcdserver: allow 1 learner in cluster
Hard-coded the maximum number of learners to 1.
2019-05-28 18:47:03 -07:00
Jingyi Hu c438f6db27 etcdserver: check IsMemberExist before IsLearner
If member does not exist in cluster, IsLearner will panic.
2019-05-28 18:47:03 -07:00
Jingyi Hu d0c1b3fa38 etcdserver: learner return Unavailable for unsupported RPC
Make learner return code.Unavailable when the request is not supported
by learner. Client balancer will retry a different endpoint.
2019-05-28 18:47:03 -07:00
Jingyi Hu 76a63f9f7d etcdserver: adjust StrictReconfigCheck
Adjust StrictReconfigCheck logic to accommodate learner members in the
cluster.
2019-05-28 18:47:03 -07:00
Gyuho Lee 34bd797e67 *: revert module import paths
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-05-28 15:39:35 -07:00
Jingyi Hu 23f1d02391 *: address comments 2019-05-15 15:58:46 -07:00
Jingyi Hu c836e37a83 etcdserver: remove unnecessary bool comparison
Fixes 'gosimple' test.
2019-05-15 13:48:54 -07:00
WizardCXY a039f2efb8 clientv3, etcdctl: MemberPromote for learner 2019-05-15 13:48:52 -07:00
Jingyi Hu 44d935e90a etcdserver: exclude learner from leader transfer
1. Maintenance API MoveLeader() returns ErrBadLeaderTransferee if
transferee does not exist or is raft learner.

2. etcdserver TransferLeadership() only choose voting member as
transferee.
2019-05-15 13:27:42 -07:00
WizardCXY 7f9479acc1 clientv3: add member promote 2019-05-15 13:27:42 -07:00
WizardCXY ba9fd620e8 etcdserver: support MemberPromote for learner 2019-05-15 13:27:42 -07:00
Jingyi Hu 43ed94f769 etcdserver: filter rpc request to learner
Hardcoded allowed rpc for learner node. Added filtering in grpc
interceptor to check if rpc is allowed for learner node.
2019-05-15 13:15:20 -07:00
Jingyi Hu 355d0ab2a6 *: add learner field in endpoint status
Added learner field to endpoint status API.
2019-05-15 13:13:59 -07:00
Jingyi Hu e1acf244c1 etcdserver: Add MemberAddAsLearner
Made changes to api/membership:

- Added MemberAddAsLearner
- Reverted changes to MemberAdd - removed input parameter isLearner
2019-05-14 18:18:10 -07:00
Jingyi Hu 2b76200f70 *: add MemberAddAsLearner to clientv3 Cluster API
Made changes to Clientv3 Cluster API:

- Added MemberAddAsLearner.
- Reverted changes to MemberAdd - removed input parameter isLearner.
2019-05-14 16:56:44 -07:00
Jingyi Hu fc14608cb7 clientv3: support MemberAdd for learner
Added IsLearner flag to clientv3 MemberAdd API.
2019-05-14 13:10:22 -07:00
Jingyi Hu 604bc04f70 etcdserver: support MemberAdd for learner
Added IsLearner field to etcdserver internal Member type. Routed
learner MemberAdd request from server API to raft. Apply learner
MemberAdd result to server after the request is passed through Raft.
2019-05-14 13:10:22 -07:00
Jingyi Hu a0d3c4d641 *: fix compilation after API change
Fixed compilation erros after API change for learner.
2019-05-14 13:10:22 -07:00
Jingyi Hu 7dc5451fae *: Change etcdserver API to support raft learner
- Added isLearner flag to MemberAddRequest in Cluster API.
- Added isLearner field to StatusResponse in Maintenance API.
- Added MemberPromote rpc to Cluster API.
2019-05-14 13:09:17 -07:00
Sam Batschelet 1411c585be etcdserver: fix typo in log message
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-05-10 09:54:00 -04:00
shivaramr 9150bf52d6 go modules: Fix module path version to include version number 2019-04-26 15:29:50 -07:00
Jingyi Hu cca0d5c1be
Merge pull request #10672 from nolouch/fix-probing-log
api/rafthttp: fix the probing status log print
2019-04-24 23:05:23 -07:00
nolouch decc0d5f43 api/rafthttp: fix the probing status print
Signed-off-by: nolouch <nolouch@gmail.com>
2019-04-23 19:51:34 +08:00
Gyuho Lee 877f11bed8 etcdserver: improve heartbeat send failures logging
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-04-19 10:58:17 -07:00
Sam Batschelet 9915d02022 *: Change gRPC proxy to expose etcd server endpoint /metrics
This PR resolves an issue where the `/metrics` endpoints exposed by the proxy were not returning metrics of the etcd members servers but of the proxy itself.

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-04-10 16:09:32 -04:00
Brandon Philips f17c038fc4 etcdserver: api/v2v3: add initial tests
Add a bunch of tests on fundamental operation of Create/Set/Delete.

Context: I want to use this package to run the discovery service against
the v3 storage backend. The limited functionality already implemented
should be sufficient to do this.
https://github.com/coreos/discovery.etcd.io/issues/52
2019-04-10 19:17:25 +00:00
James Shubin 368f70a37c etcdserver: Use panic instead of fatal on no space left error
When using the embed package to embed etcd, sometimes the storage prefix
being used might be full. In this case, this code path triggers, causing
an: `etcdserver: create wal error: no space left on device` error, which
causes a fatal. A fatal differs from a panic in that it also calls
os.Exit(1). In this situation, the calling program that embeds the etcd
server will be abruptly killed, which prevents it from cleaning up
safely, and giving a proper error message. Depending on what the calling
program is, this can cause corruption and data loss.

This patch switches the fatal to a panic. Ideally this would be a
regular error which would get propagated upwards to the StartEtcd
command, but in the meantime at least this can be caught with recover().

This fixes the most common fatal that I've experienced, but there are
surely more that need looking into. If possible, the errors should be
threaded down into the code path so that embedding etcd can be more
robust.

Fixes: https://github.com/etcd-io/etcd/issues/10588
2019-03-27 15:24:33 -04:00
johncming bd41f74168 etcdserver/api/rafthttp: fix the location of close http body. 2019-03-11 22:20:38 +08:00
zhoulin xie a943ad0ee4 client/keys_bench_test.go: Fix some misspells
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-28 14:36:06 -05:00
Gyuho Lee 8d1a62e7ef *: use default log configuration for server
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-02-21 10:57:26 -08:00
WizardCXY e6c6d8492e *: add flag to let etcd use the new boltdb freelistType feature 2019-02-14 11:07:08 +08:00
Jingyi Hu 7aa6358510 etcdserver: remove auth validation loop
Remove auth validation loop in v3_server.raftRequest(). Re-validation
when error ErrAuthOldRevision occurs should be handled on client side.
2019-02-12 15:40:39 -08:00
Hitoshi Mitake 72dd4a18c5 *: add a new option --enable-grpc-gateway for enabling/disabling grpc gateway 2019-01-23 03:26:34 +09:00
Xiang Li 2063b358c8
Merge pull request #10218 from mailgun/maxim/develop
Remove infinite loop in doSerialize
2019-01-09 10:38:25 -08:00
johncming e8f46ce341 etcdserver: add a test to verify not to send duplicated append responses 2019-01-09 10:37:43 +08:00
Sam Batschelet 577d7c0df2 e2e: update test to reflect (ST1005) update.
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-01-08 21:04:20 -05:00
Sam Batschelet a82703b69e *: error strings should not end with punctuation or a newline (ST1005)
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-01-08 21:04:20 -05:00
Xiang Li 6511829d1f
Merge pull request #10374 from johncming/deprecated
api/rafthttp: remove deprecated req.Cancel.
2019-01-08 14:33:25 -08:00
Gyuho Lee 442c863413
Merge pull request #10377 from johncming/cancel-pos
api/v2auth: remove defer in loop.
2019-01-08 09:43:06 -08:00
Xiang Li b04633fd8e
Merge pull request #10375 from johncming/redundant-parentheses
etcdserver: remove redundant parentheses.
2019-01-07 18:38:26 -08:00
caoming e96dbfb973 api/v2auth: remove defer in loop. 2019-01-08 08:56:55 +08:00
caoming 5060560f92 api/v2store: use camel case instead of snake case. 2019-01-07 10:35:23 +08:00
caoming 802e2aaadd etcdserver: remove redundant parentheses. 2019-01-07 10:27:52 +08:00
caoming 4651f49a5c api/rafthttp: remove deprecated req.Cancel. 2019-01-07 10:12:47 +08:00
caoming b2e0e760a0 etcdserver: add missing lg assignment. 2019-01-05 09:24:48 +08:00
lsytj0413 792aad932f refactor(*): fix golint warning 2018-12-24 11:43:10 +08:00
Xiang Li 3faed211e5 *: add flags to setup backend related config 2018-11-26 15:50:26 -08:00
Gyuho Lee 291768af0f etcdserver/*: add "etcd_cluster_version" metric
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-11-13 21:49:12 -08:00
Maxim Vladimirskiy 91e583cba6 etcdserver: Remove infinite loop in doSerialize
Once chk(ai) fails with auth.ErrAuthOldRevision it will always do,
regardless how many times you retry. So the error is better be returned
to fail the pending request and make the client re-authenticate.
2018-11-12 23:28:24 +03:00
Shin'ya Ueoka aa4313a55a *: fix github links 2018-11-10 11:14:18 +09:00
Gyuho Lee 0f0919c19c
Merge pull request #10159 from gyuho/version-log
etcdserver: clear message in cluster version decision
2018-10-09 18:10:14 -07:00
Gyuho Lee d2a0f17b82
Merge pull request #10155 from gyuho/metrics-messages
rafthttp: probe all raft transports
2018-10-09 11:18:31 -07:00
Gyuho Lee 59dd78dde8 etcdserver: clear message in cluster version decision
Only leader can decide cluster version.
Clarify the logging that this local node is the leader.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-08 16:05:10 -07:00
Gyuho Lee 601d8b4677 etcdserver/api/etcdhttp: remove unused "HandleHealth" function
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:16:18 -07:00
Gyuho Lee 004e04a1d1 etcdserver/api/etcdhttp: add "etcd_server_health_success/failures"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:15:12 -07:00
Gyuho Lee 884a8bd36b etcdserver/api/rafthttp: configure "streamProber" in tests
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:32:05 -07:00
Gyuho Lee 7b1ef37054 etcdserver/api/rafthttp: probe all Raft messages' RTT
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.

In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:28:54 -07:00
Gyuho Lee 4a239070c8 etcdserver/api/rafthttp: display roundtripper name in warnings
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:14:42 -07:00
Gyuho Lee 47cff4dfe5 etcdserver/api/rafthttp: rename to "pipelineProber"
Preliminary work to add prober to "streamRt"

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:13:10 -07:00
nolouch 6ea54195a6 client/integration: try to fix tests 2018-09-18 01:44:57 +08:00
nolouch c15fb607f6 server: broadcast leader changed 2018-09-17 14:15:04 +08:00
nolouch f3f6427586 server: prevent blocking 2018-09-14 16:08:29 +08:00
nolouch 4de27039cb server: drop read request if found leader changed 2018-09-14 15:58:35 +08:00
Gyuho Lee 8560221091 etcdserver: fix gofmt warnings with Go 1.11
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 21:45:12 -07:00