Commit Graph

2020 Commits (bc9e433ca26be0314a37c8c6ead66429a584bbc9)

Author SHA1 Message Date
Joe Betz 9b51febaf5 *: Add experimental-compaction-batch-limit flag 2019-08-15 11:47:23 -07:00
Gyuho Lee 06b82c200f etcdserver: add "etcd_server_snapshot_apply_inflights_total"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 13:33:52 -07:00
Gyuho Lee 46bddacacb etcdserver/api: add "etcd_network_snapshot_send_inflights_total", "etcd_network_snapshot_receive_inflights_total"
Useful for deciding when to terminate the unhealthy follower.
If the follower is receiving a leader snapshot, operator may wait.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-08 13:33:48 -07:00
Zeming YU 3edb569ad3
v3rpc: fix a typo `err`
don't read return value in child goroutine which causes data race.
2019-08-06 14:04:58 -07:00
Gyuho Lee 7fbbb9c8bf *: add 3.5 capability for 3.5 dev tree
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-02 15:27:54 -07:00
Gyuho Lee 3658571e3a etcdserver/api: enable 3.4 capability
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-02 15:24:18 -07:00
Gyuho Lee 6a0811a949 *: use new adt.IntervalTree interface
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-31 22:23:13 -07:00
Gyuho Lee c6e3401255 etcdserver: make raft log configured by top level logger
To make it consistent

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-29 15:43:19 -07:00
Gyuho Lee 2f30e9ad7f etcdserver: document v2 usage in "publish" method
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-28 21:07:39 -05:00
Gyuho Lee 7cbe2f5dd6 etcdserver/api/v3rpc: use new "credentials" package
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-26 08:56:38 -07:00
Gyuho Lee 50babc16e7 etcdserver/api/v2v3: skip tests for CI
To fix in v3.5

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-07-26 05:54:58 -07:00
Gyuho Lee 388d15f521
Merge pull request #10622 from philips/add-v2v3-tests
etcdserver: api/v2v3: add initial tests
2019-07-25 10:05:29 -07:00
Sahdev P. Zala 1cef112a79 etcdserver: do not allow creating empty role
Like user, we should not allow creating empty role.

Related #10905
2019-07-24 17:41:24 -04:00
Tobias Schottdorf b9c051e7a7 raftpb: clean up naming in ConfChange 2019-07-23 10:40:03 +02:00
Tobias Schottdorf b67303c6a2 raft: allow use of joint quorums
This change introduces joint quorums by changing the Node and RawNode
API to accept pb.ConfChangeV2 (on top of pb.ConfChange).

pb.ConfChange continues to work as today: it allows carrying out a
single configuration change. A pb.ConfChange proposal gets added to
the Raft log as such and is thus also observed by the app during Ready
handling, and fed back to ApplyConfChange.

ConfChangeV2 allows joint configuration changes but will continue to
carry out configuration changes in "one phase" (i.e. without ever
entering a joint config) when this is possible.
2019-07-23 10:40:03 +02:00
Tobias Schottdorf eb4d9b640a etcdserver: fix createConfChangeEnts
It created a sequence of conf changes that could intermittently cause an
empty set of voters, which Raft asserts against as of #10889.

This fixes TestCtlV2BackupSnapshot and TestCtlV2BackupV3Snapshot, see:
https://github.com/etcd-io/etcd/issues/10700#issuecomment-512358126
2019-07-19 17:13:08 +02:00
Tobias Schottdorf c9491d7861 raft: clean up bootstrap
This is the first (maybe not last) step in cleaning up the bootstrap
code around StartNode.

Initializing a Raft group for the first time is awkward, since a
configuration has to be pulled from thin air. The way this is solved
today is unclean: The app is supposed to pass peers to StartNode(),
we add configuration changes for them to the log, immediately pretend
that they are applied, but actually leave them unapplied (to give the
app a chance to observe them, though if the app did decide to not apply
them things would really go off the rails), and then return control to
the app. The app will then process the initial Readys and as a result
the configuration will be persisted to disk; restarts of the node then
use RestartNode which doesn't take any peers.

The code that did this lived awkwardly in two places fairly deep down
the callstack, though it was really only necessary in StartNode(). This
commit refactors things to make this more obvious: only StartNode does
this dance now. In particular, RawNode does not support this at all any
more; it expects the app to set up its Storage correctly.

Future work may provide helpers to make this "preseeding" of the Storage
more user-friendly. It isn't entirely straightforward to do so since
the Storage interface doesn't provide the right accessors for this
purpose. Briefly speaking, we want to make sure that a non-bootstrapped
node can never catch up via the log so that we can implicitly use one
of the "skipped" log entries to represent the configuration change into
the bootstrap configuration. This is an invasive change that affects
all consumers of raft, and it is of lower urgency since the code (post
this commit) already encapsulates the complexity sufficiently.
2019-07-19 10:02:02 +02:00
lzhfromustc d35f6647bc
Use newbe instead of s.be to avoid potential race
`s.cluster.SetBackend(s.be)` is not in critical section. Using `newbe` instead of `s.be` can avoid potential data race.
2019-07-08 14:24:52 -07:00
Tobias Schottdorf f9c2d00fb3 raft: extract 'tracker' package
Mechanically extract `progressTracker`, `Progress`, and `inflights`
to their own package named `tracker`. Add lots of comments in the
progress, and take the opportunity to rename and clarify various
fields.
2019-06-21 22:15:00 +02:00
Nima Yahyazadeh b1812a410f Raft HTTP: fix pause/resume race condition 2019-06-17 11:45:25 -04:00
Jingyi Hu 48d144a3de
Merge pull request #10731 from WIZARD-CXY/learner_metric
etcdserver: add learner metrics
2019-06-08 22:43:03 -07:00
Xiang Li 9a73013004
Merge pull request #10797 from jingyih/lease_checkpoint_enabled_by_experimental_flag
*: enable lease checkpoint via experimental flag
2019-06-05 22:56:54 -07:00
Jingyi Hu e67b9829b6 *: enable lease checkpoint via experimental flag
Primary lessor persist lease remainingTTL only if experimental flag
"--experimental-enable-lease-checkpoint" is set.
2019-06-05 15:30:03 -07:00
Gyuho Lee 1caaa9ed4a test: test update for Go 1.12.5 and related changes
Update to Go 1.12.5 testing. Remove deprecated unused and gosimple
pacakges, and mask staticcheck 1006. Also, fix unconvert errors related
to unnecessary type conversions and following staticcheck errors:
- remove redundant return statements
- use for range instead of for select
- use time.Since instead of time.Now().Sub
- omit comparison to bool constant
- replace T.Fatal and T.Fatalf in tests with T.Error and T.Fatalf respectively because the goroutine calls T.Fatal must be called in the same goroutine as the test
- fix error strings that should not be capitalized
- use sort.Strings(...) instead of sort.Sort(sort.StringSlice(...))
- use he status code of Canceled instead of grpc.ErrClientConnClosing which is deprecated
- use use status.Errorf instead of grpc.Errorf which is deprecated

Related #10528 #10438
2019-06-05 17:02:05 -04:00
宇慕 0b8727b3f3 etcdserver: add learner metrics 2019-06-05 10:51:21 +08:00
Gyuho Lee cdca488d8b
Merge pull request #9817 from mitake/no-password
*: support creating a user without password
2019-05-31 22:41:29 -07:00
Xiang Li d8e2e47de5
Merge pull request #10693 from nolouch/fix-lease
lease/lessor: recheck if exprired lease is revoked
2019-05-31 12:13:36 -07:00
Hitoshi Mitake 5a67dd788d *: support creating a user without password
This commit adds a feature for creating a user without password. The
purpose of the feature is reducing attack surface by configuring bad
passwords (CN based auth will be allowed for the user).

The feature can be used with `--no-password` of `etcdctl user add`
command.

Fix https://github.com/coreos/etcd/issues/9590
2019-05-30 21:59:30 +09:00
nolouch dc8a31eaf0 lease/lessor: recheck if exprired lease is revoked
Signed-off-by: nolouch <nolouch@gmail.com>
2019-05-29 14:27:19 +08:00
Jingyi Hu 23511d21ec *: address comments 2019-05-28 18:50:13 -07:00
Jingyi Hu 6bf609b96d integration: update TestMemberPromote test
Update TestMemberPromote to include both learner not-ready and learner
ready test cases.

Removed unit test TestPromoteMember, it requires underlying raft node to
be started and running. The member promote is covered by the integration
test.
2019-05-28 18:50:13 -07:00
宇慕 3f94385fc6 etcdserver: update raftStatus 2019-05-28 18:50:13 -07:00
Jingyi Hu e994a4df01 etcdserver: check http StatusCode before unmarshal
Check http StatusCode. Only Unmarshal body if StatusCode is statusOK.
2019-05-28 18:50:13 -07:00
Jingyi Hu f8ad8ae4ad etcdserver: use etcdserver ErrLearnerNotReady
If learner is not ready to be promoted, use etcdserver.ErrLearnerNotReady
instead of using membership.ErrLearnerNotReady.
2019-05-28 18:50:13 -07:00
Jingyi Hu f5eaaaf440 etcdserver: forward member promote to leader 2019-05-28 18:50:10 -07:00
宇慕 dfe296ac3c etcdserver: add mayPromote check 2019-05-28 18:47:03 -07:00
Jingyi Hu aa4cda2f5c etcdserver: allow 1 learner in cluster
Hard-coded the maximum number of learners to 1.
2019-05-28 18:47:03 -07:00
Jingyi Hu c438f6db27 etcdserver: check IsMemberExist before IsLearner
If member does not exist in cluster, IsLearner will panic.
2019-05-28 18:47:03 -07:00
Jingyi Hu d0c1b3fa38 etcdserver: learner return Unavailable for unsupported RPC
Make learner return code.Unavailable when the request is not supported
by learner. Client balancer will retry a different endpoint.
2019-05-28 18:47:03 -07:00
Jingyi Hu 76a63f9f7d etcdserver: adjust StrictReconfigCheck
Adjust StrictReconfigCheck logic to accommodate learner members in the
cluster.
2019-05-28 18:47:03 -07:00
Gyuho Lee 34bd797e67 *: revert module import paths
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-05-28 15:39:35 -07:00
Jingyi Hu 23f1d02391 *: address comments 2019-05-15 15:58:46 -07:00
Jingyi Hu c836e37a83 etcdserver: remove unnecessary bool comparison
Fixes 'gosimple' test.
2019-05-15 13:48:54 -07:00
WizardCXY a039f2efb8 clientv3, etcdctl: MemberPromote for learner 2019-05-15 13:48:52 -07:00
Jingyi Hu 44d935e90a etcdserver: exclude learner from leader transfer
1. Maintenance API MoveLeader() returns ErrBadLeaderTransferee if
transferee does not exist or is raft learner.

2. etcdserver TransferLeadership() only choose voting member as
transferee.
2019-05-15 13:27:42 -07:00
WizardCXY 7f9479acc1 clientv3: add member promote 2019-05-15 13:27:42 -07:00
WizardCXY ba9fd620e8 etcdserver: support MemberPromote for learner 2019-05-15 13:27:42 -07:00
Jingyi Hu 43ed94f769 etcdserver: filter rpc request to learner
Hardcoded allowed rpc for learner node. Added filtering in grpc
interceptor to check if rpc is allowed for learner node.
2019-05-15 13:15:20 -07:00
Jingyi Hu 355d0ab2a6 *: add learner field in endpoint status
Added learner field to endpoint status API.
2019-05-15 13:13:59 -07:00
Jingyi Hu e1acf244c1 etcdserver: Add MemberAddAsLearner
Made changes to api/membership:

- Added MemberAddAsLearner
- Reverted changes to MemberAdd - removed input parameter isLearner
2019-05-14 18:18:10 -07:00
Jingyi Hu 2b76200f70 *: add MemberAddAsLearner to clientv3 Cluster API
Made changes to Clientv3 Cluster API:

- Added MemberAddAsLearner.
- Reverted changes to MemberAdd - removed input parameter isLearner.
2019-05-14 16:56:44 -07:00
Jingyi Hu fc14608cb7 clientv3: support MemberAdd for learner
Added IsLearner flag to clientv3 MemberAdd API.
2019-05-14 13:10:22 -07:00
Jingyi Hu 604bc04f70 etcdserver: support MemberAdd for learner
Added IsLearner field to etcdserver internal Member type. Routed
learner MemberAdd request from server API to raft. Apply learner
MemberAdd result to server after the request is passed through Raft.
2019-05-14 13:10:22 -07:00
Jingyi Hu a0d3c4d641 *: fix compilation after API change
Fixed compilation erros after API change for learner.
2019-05-14 13:10:22 -07:00
Jingyi Hu 7dc5451fae *: Change etcdserver API to support raft learner
- Added isLearner flag to MemberAddRequest in Cluster API.
- Added isLearner field to StatusResponse in Maintenance API.
- Added MemberPromote rpc to Cluster API.
2019-05-14 13:09:17 -07:00
Sam Batschelet 1411c585be etcdserver: fix typo in log message
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-05-10 09:54:00 -04:00
shivaramr 9150bf52d6 go modules: Fix module path version to include version number 2019-04-26 15:29:50 -07:00
Jingyi Hu cca0d5c1be
Merge pull request #10672 from nolouch/fix-probing-log
api/rafthttp: fix the probing status log print
2019-04-24 23:05:23 -07:00
nolouch decc0d5f43 api/rafthttp: fix the probing status print
Signed-off-by: nolouch <nolouch@gmail.com>
2019-04-23 19:51:34 +08:00
Gyuho Lee 877f11bed8 etcdserver: improve heartbeat send failures logging
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-04-19 10:58:17 -07:00
Sam Batschelet 9915d02022 *: Change gRPC proxy to expose etcd server endpoint /metrics
This PR resolves an issue where the `/metrics` endpoints exposed by the proxy were not returning metrics of the etcd members servers but of the proxy itself.

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-04-10 16:09:32 -04:00
Brandon Philips f17c038fc4 etcdserver: api/v2v3: add initial tests
Add a bunch of tests on fundamental operation of Create/Set/Delete.

Context: I want to use this package to run the discovery service against
the v3 storage backend. The limited functionality already implemented
should be sufficient to do this.
https://github.com/coreos/discovery.etcd.io/issues/52
2019-04-10 19:17:25 +00:00
James Shubin 368f70a37c etcdserver: Use panic instead of fatal on no space left error
When using the embed package to embed etcd, sometimes the storage prefix
being used might be full. In this case, this code path triggers, causing
an: `etcdserver: create wal error: no space left on device` error, which
causes a fatal. A fatal differs from a panic in that it also calls
os.Exit(1). In this situation, the calling program that embeds the etcd
server will be abruptly killed, which prevents it from cleaning up
safely, and giving a proper error message. Depending on what the calling
program is, this can cause corruption and data loss.

This patch switches the fatal to a panic. Ideally this would be a
regular error which would get propagated upwards to the StartEtcd
command, but in the meantime at least this can be caught with recover().

This fixes the most common fatal that I've experienced, but there are
surely more that need looking into. If possible, the errors should be
threaded down into the code path so that embedding etcd can be more
robust.

Fixes: https://github.com/etcd-io/etcd/issues/10588
2019-03-27 15:24:33 -04:00
johncming bd41f74168 etcdserver/api/rafthttp: fix the location of close http body. 2019-03-11 22:20:38 +08:00
zhoulin xie a943ad0ee4 client/keys_bench_test.go: Fix some misspells
Signed-off-by: zhoulin xie <zhoulin.xie@daocloud.io>
2019-02-28 14:36:06 -05:00
Gyuho Lee 8d1a62e7ef *: use default log configuration for server
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-02-21 10:57:26 -08:00
WizardCXY e6c6d8492e *: add flag to let etcd use the new boltdb freelistType feature 2019-02-14 11:07:08 +08:00
Jingyi Hu 7aa6358510 etcdserver: remove auth validation loop
Remove auth validation loop in v3_server.raftRequest(). Re-validation
when error ErrAuthOldRevision occurs should be handled on client side.
2019-02-12 15:40:39 -08:00
Hitoshi Mitake 72dd4a18c5 *: add a new option --enable-grpc-gateway for enabling/disabling grpc gateway 2019-01-23 03:26:34 +09:00
Xiang Li 2063b358c8
Merge pull request #10218 from mailgun/maxim/develop
Remove infinite loop in doSerialize
2019-01-09 10:38:25 -08:00
johncming e8f46ce341 etcdserver: add a test to verify not to send duplicated append responses 2019-01-09 10:37:43 +08:00
Sam Batschelet 577d7c0df2 e2e: update test to reflect (ST1005) update.
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-01-08 21:04:20 -05:00
Sam Batschelet a82703b69e *: error strings should not end with punctuation or a newline (ST1005)
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2019-01-08 21:04:20 -05:00
Xiang Li 6511829d1f
Merge pull request #10374 from johncming/deprecated
api/rafthttp: remove deprecated req.Cancel.
2019-01-08 14:33:25 -08:00
Gyuho Lee 442c863413
Merge pull request #10377 from johncming/cancel-pos
api/v2auth: remove defer in loop.
2019-01-08 09:43:06 -08:00
Xiang Li b04633fd8e
Merge pull request #10375 from johncming/redundant-parentheses
etcdserver: remove redundant parentheses.
2019-01-07 18:38:26 -08:00
caoming e96dbfb973 api/v2auth: remove defer in loop. 2019-01-08 08:56:55 +08:00
caoming 5060560f92 api/v2store: use camel case instead of snake case. 2019-01-07 10:35:23 +08:00
caoming 802e2aaadd etcdserver: remove redundant parentheses. 2019-01-07 10:27:52 +08:00
caoming 4651f49a5c api/rafthttp: remove deprecated req.Cancel. 2019-01-07 10:12:47 +08:00
caoming b2e0e760a0 etcdserver: add missing lg assignment. 2019-01-05 09:24:48 +08:00
lsytj0413 792aad932f refactor(*): fix golint warning 2018-12-24 11:43:10 +08:00
Xiang Li 3faed211e5 *: add flags to setup backend related config 2018-11-26 15:50:26 -08:00
Gyuho Lee 291768af0f etcdserver/*: add "etcd_cluster_version" metric
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-11-13 21:49:12 -08:00
Maxim Vladimirskiy 91e583cba6 etcdserver: Remove infinite loop in doSerialize
Once chk(ai) fails with auth.ErrAuthOldRevision it will always do,
regardless how many times you retry. So the error is better be returned
to fail the pending request and make the client re-authenticate.
2018-11-12 23:28:24 +03:00
Shin'ya Ueoka aa4313a55a *: fix github links 2018-11-10 11:14:18 +09:00
Gyuho Lee 0f0919c19c
Merge pull request #10159 from gyuho/version-log
etcdserver: clear message in cluster version decision
2018-10-09 18:10:14 -07:00
Gyuho Lee d2a0f17b82
Merge pull request #10155 from gyuho/metrics-messages
rafthttp: probe all raft transports
2018-10-09 11:18:31 -07:00
Gyuho Lee 59dd78dde8 etcdserver: clear message in cluster version decision
Only leader can decide cluster version.
Clarify the logging that this local node is the leader.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-08 16:05:10 -07:00
Gyuho Lee 601d8b4677 etcdserver/api/etcdhttp: remove unused "HandleHealth" function
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:16:18 -07:00
Gyuho Lee 004e04a1d1 etcdserver/api/etcdhttp: add "etcd_server_health_success/failures"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 17:15:12 -07:00
Gyuho Lee 884a8bd36b etcdserver/api/rafthttp: configure "streamProber" in tests
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:32:05 -07:00
Gyuho Lee 7b1ef37054 etcdserver/api/rafthttp: probe all Raft messages' RTT
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.

In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:28:54 -07:00
Gyuho Lee 4a239070c8 etcdserver/api/rafthttp: display roundtripper name in warnings
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:14:42 -07:00
Gyuho Lee 47cff4dfe5 etcdserver/api/rafthttp: rename to "pipelineProber"
Preliminary work to add prober to "streamRt"

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-10-07 03:13:10 -07:00
nolouch 6ea54195a6 client/integration: try to fix tests 2018-09-18 01:44:57 +08:00
nolouch c15fb607f6 server: broadcast leader changed 2018-09-17 14:15:04 +08:00
nolouch f3f6427586 server: prevent blocking 2018-09-14 16:08:29 +08:00
nolouch 4de27039cb server: drop read request if found leader changed 2018-09-14 15:58:35 +08:00
Gyuho Lee 8560221091 etcdserver: fix gofmt warnings with Go 1.11
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 21:45:12 -07:00
Gyuho Lee 0ef9ef3c74 *: rerun "gofmt"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 18:25:39 -07:00
Gyuho Lee 1399bc69ce etcdserver: update import paths to "go.etcd.io/etcd"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-28 17:47:55 -07:00
Sam Batschelet af85949b41
Merge pull request #10024 from gyuho/became-inactive
etcdserver/api/rafthttp: clarify "became inactive" warning
2018-08-24 22:12:16 -04:00
Sam Batschelet 24ee22ab48
Merge pull request #10026 from gyuho/read-index
etcdserver: clarify read index wait timeout warnings
2018-08-24 22:11:58 -04:00
Gyuho Lee 38711761a1 etcdserver: clarify read index wait timeout warnings
"read index" doesn't tell much about the root cause.
Most likely, the local follower node is having slow
network, thus timing out waiting to receive read
index response from leader.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-17 17:59:41 -07:00
Gyuho Lee 156ff6461d etcdserver/api/rafthttp: clarify "became inactive" warning
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-17 17:45:53 -07:00
Jingyi Hu 8d85259b56 etcdserver/api/v3rpc/interceptor: add log level checking
Check log level before generating and writing log info.
2018-08-17 16:12:05 -07:00
Gyuho Lee 6f4c509ad8 etcdserver/api/rafthttp: add v3 snapshot send/receive metrics
Distribution would be:
0.1 second or more
...
25.6 seconds or more
51.2 seconds or more

etcd_network_snapshot_send_success
etcd_network_snapshot_send_failures
etcd_network_snapshot_send_total_duration_seconds
etcd_network_snapshot_receive_success
etcd_network_snapshot_receive_failures
etcd_network_snapshot_receive_total_duration_seconds

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-15 12:56:50 -07:00
Gyuho Lee c392cd20cf etcdserver/api/snap: add v3 snapshot fsync metrics
etcd_snap_db_fsync_duration_seconds_count
etcd_snap_db_save_total_duration_seconds_bucket

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-15 12:56:44 -07:00
Gyuho Lee eb6738053b etcdserver: add "etcd_server_id" metric
```
etcd_server_id{server_id="8e9e05c52164694d"} 1
```

Useful for automating membership change operations,
no need to run "endpoint status" or "member list"
command to get member IDs.

Also, useful for "etcd_network" metrics with "To/From" labels.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2018-08-13 00:39:18 -07:00
Jingyi Hu 368010d8a3 etcdserver: code clean up
Code clean up in interceptor.go
2018-08-10 16:29:42 -07:00
Jingyi Hu 30662940f4 vendor: add go-grpc-middleware
Rebased to master PR #9994.  Fixed a Go format issue in
v3rpc/interceptor.go.  Updated vendor to include go-grpc-middleware.
2018-08-10 13:15:35 -07:00
Jingyi Hu dc01734c6b etcdserver: add grpc interceptor to log info on incoming requests to
etcd server

To improve debuggability of etcd v3. Added a grpc interceptor to log
info on incoming requests to etcd server. The log output includes
remote client info, request content (with value field redacted), request
handling latency, response size, etc. Uses zap logger if available,
otherwise uses capnslog.

Also did some clean up on the chaining of grpc interceptors on server
side.
2018-08-10 11:01:07 -07:00
Iskander Sharipov d0f800c930 etcdserver/api/v2discovery: simplify !(x == y) to x != y
Found using https://go-critic.github.io/overview#boolExprSimplify-ref
2018-07-28 23:47:17 +03:00
Joe Betz 750b87d622
Merge pull request #9924 from jpbetz/persist-lease-deadline
lease: Persist remainingTTL to prevent indefinite auto-renewal of long lived leases
2018-07-24 09:39:57 -07:00
Joe Betz 2edb954bce
lease: Checkpoint lease TTLs to prevent indefinite auto-renewal of long lived leases 2018-07-23 16:12:34 -07:00
Gyuho Lee 643d791a11 etcdserver: add "etcd_server_go_version" metric
Currently, one has to look at server logs manually,
to see what Go version was used to build etcd server.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-23 09:15:22 -07:00
Gyuho Lee 51af6a062f etcdserver: clean up code format
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-21 16:03:16 -07:00
Gyuho Lee 57ec2226cc etcdserver: support zap.Logger in FD monitoring routine
Keep replacing capnslog

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-20 14:59:03 -07:00
Joe Betz bbe2d777b1
lease: Add LessorConfig and wire zap logger into Lessor 2018-07-17 13:10:34 -07:00
Joe Betz 75ac18cd2d
lease: Add and lease checkpoint protobuf types 2018-07-17 11:54:51 -07:00
Gyuho Lee ddf45cb958 etcdserver: remove configuration print methods
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-09 11:26:23 -07:00
Gyuho Lee 9934034bb1 etcdserver: remove unnecessary if-statement
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-05 10:25:00 -07:00
Gyuho Lee e714dd01b3 etcdserver: remove unnecessary type conversion
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-05 10:08:56 -07:00
Gyuho Lee 37000cc4b8 etcdserver: add "etcd_server_slow_read_indexes_total"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-02 12:58:35 -07:00
Gyuho Lee 4733a1db5c etcdserver: clarify read index warnings
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-07-02 12:53:35 -07:00
Joe Betz 4b51b6de49
*: Add progress notify request watch request 2018-06-27 16:46:13 -07:00
Gyuho Lee 339cda03b3 etcdserver/api/v3rpc: remove duplicate gRPC logger set
Fix

=== RUN   TestEmbedEtcd
==================
WARNING: DATA RACE
Write at 0x000001df86d0 by goroutine 711:
  github.com/coreos/etcd/embed.(*Config).setupLogging.func1()
      /go/src/github.com/coreos/etcd/vendor/google.golang.org/grpc/grpclog/loggerv2.go:68 +0x16c
  sync.(*Once).Do()
      /usr/local/go/src/sync/once.go:44 +0xe1
  github.com/coreos/etcd/embed.(*Config).setupLogging()

in gRPC proxy tests.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-18 09:41:37 -07:00
Gyuho Lee b241e383fd
Merge pull request #9858 from gyuho/lll
etcdserver: clean up election tick timeout log output
2018-06-15 13:40:44 -07:00
Gyuho Lee 52ffe9f79a etcdserver: clean up election tick timeout log output
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-15 13:35:25 -07:00
Gyuho Lee 929d390520 etcdserver: log quota only once
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-15 13:13:44 -07:00
Gyuho Lee 8990126c17 rafthttp: add "RaftDropHeartbeat" failpoint
To simulate network partition locally.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-15 13:10:58 -07:00
Gyuho Lee a133e9fc8c etcdserver: remove TODO from "warnOfExpensiveGenericRequest"
Metric is already added.

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-12 09:37:00 -07:00
Joe Betz a6fad51603
etcdserver: Fix txn request 'took too long' warnings to use loggable request stringer 2018-06-11 16:58:48 -07:00
Gyuho Lee 849d3f2ac9
Merge pull request #9826 from jpbetz/response_sizes
etcdserver: Add response byte size and range response count to took too long warning
2018-06-11 11:13:24 -07:00
Joe Betz b47e148d5d
etcdserver: Add response byte size and range response count to took too long warning 2018-06-11 10:02:30 -07:00
Gyuho Lee e22ab78ccb
Merge pull request #9822 from jpbetz/value_size_stringer
etcdserver: Replace value contents with value_size in request took too long warning
2018-06-08 09:39:56 -07:00
Joe Betz 225b0bf80a
etcdserver: Replace value contents with value_size in request took too long warning 2018-06-07 16:41:48 -07:00
Gyuho Lee 7dd7018835 etcdserver: add "etcd_server_quota_backend_bytes"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-06-07 10:44:51 -07:00
Gyuho Lee 3821f3364d etcdserver/api/rafthttp: add "etcd_network_active_peers/disconnected_peers_total"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 14:23:45 -07:00
Gyuho Lee afe78fbe69 etcdserver/api/v2http: document histogram
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 14:03:28 -07:00
Gyuho Lee 973fe43b83 etcdserver/api/snap: document histograms, add "etcd_snap_fsync_duration_seconds"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 14:03:28 -07:00
Gyuho Lee 640f5e64a9 etcdserver/api/rafthttp: document round-trip metrics, clean up
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 14:03:28 -07:00
Gyuho Lee 5a9e48be30 etcdserver/api/rafthttp: increase bucket upperbound up-to 3-sec
From 0.8 sec to 3.2 sec for more detailed latency analysis

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 14:03:28 -07:00
Gyuho Lee a1aade8c1b etcdserver: rename to "heartbeat_send_failures_total"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 13:11:08 -07:00
Gyuho Lee dd1baf6e96 etcdserver: add "etcd_server_slow_apply_total"
{"level":"warn","ts":1527101858.6985068,"caller":"etcdserver/util.go:115","msg":"apply request took too long","took":0.114101529,"expected-duration":0.1,"prefix":"","request":"header:<ID:1029181977902852337> put:<key:\"\\000\\000...

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 13:09:42 -07:00
Gyuho Lee 896a5e4a2b etcdserver: add "etcd_server_heartbeat_failures_total"
{"level":"warn","ts":1527101858.4149103,"caller":"etcdserver/raft.go:370","msg":"failed to send out heartbeat; took too long, server is overloaded likely from slow disk","heartbeat-interval":0.1,"expected-duration":0.2,"exceeded-duration":0.025771662}
{"level":"warn","ts":1527101858.4149644,"caller":"etcdserver/raft.go:370","msg":"failed to send out heartbeat; took too long, server is overloaded likely from slow disk","heartbeat-interval":0.1,"expected-duration":0.2,"exceeded-duration":0.034015766}

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 13:09:42 -07:00
Gyuho Lee a5b32ba941 etcdserver/api/v3rpc: add "etcd_network_server_stream_failures_total"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-23 08:05:43 -07:00
Sam Batschelet 05554119c9 vendor: use latest "grpc-ecosystem/grpc-gateway" 2018-05-22 17:35:15 -04:00
Gyuho Lee 7940113906 *: move internal "etcdserver/api/rafthttp"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-21 10:31:16 -07:00
Gyuho Lee 2dd555c983 *: move "etcdserver/api/v3compactor"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-21 10:31:16 -07:00
Gyuho Lee 871c218894 *: move "etcdserver/api/v3alarm"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-21 10:31:16 -07:00
Gyuho Lee 9149565cb3 *: move to "etcdserver/api/membership"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-21 10:31:16 -07:00
Gyuho Lee 671f1c41a8 etcdserver/api/v2discovery: move internal "discovery" package
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-21 09:50:38 -07:00
Gyuho Lee 1e4f56114e etcdserver: use new "snap" import paths
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-21 07:45:34 -07:00
Gyuho Lee 2bd689acea etcdserver/api/snap: rename internal package
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-21 07:45:34 -07:00
Gyuho Lee 955fd99bc9
Merge pull request #9746 from gyuho/raft-logger
etcdserver: set default Raft logger with zap.Logger
2018-05-18 16:32:48 -07:00
Gyuho Lee 58ae15bd29 etcdserver: set default Raft logger with zap.Logger
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-18 15:38:39 -07:00
Gyuho Lee 49d672ff9b etcdserver: rename "SnapshotCount", add "SnapshotCatchUpEntries"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-18 14:37:50 -07:00
Gyuho Lee 366db18662 etcdserver: define test transporter
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-17 14:16:52 -07:00
Gyuho Lee 4a4be9275c
Merge pull request #9681 from gyuho/config-log
doc: deprecate '/config/local/log' in v3.5
2018-05-15 11:07:07 -07:00
Gyuho Lee 7a92bbfed2 etcdserver/*: move internal v2 packages
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-14 12:49:49 -07:00
Gyuho Lee 294b5745d6 etcdserver/api/v3rpc: support watch fragmentation with max request bytes
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-14 12:35:50 -07:00
Gyuho Lee 7088d2bf52 etcdserver/api/v3rpc: clean up, read lock on "prevKV"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-14 12:35:50 -07:00
Gyuho Lee e1df2156c7 etcdserver/api/v3rpc: clean up godoc
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-14 12:35:50 -07:00
Gyuho Lee bd43d174ae *: regenerate proto
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-14 12:35:50 -07:00
Gyuho Lee 5be21c74e4 etcdserver/etcdserverpb: add "fragment" field to "WatchRequest/Response"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-14 12:35:50 -07:00
Gyuho Lee ba7cc04bac etcdserver/api/v2v3: fix "getDir" with "sorted"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-09 15:42:45 -07:00
Gyuho Lee 26e46702a2 etcdserver/v2store: remove unused testing.T parameter
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-09 15:42:42 -07:00
Gyuho Lee 7c5cf7013f etcdserver/v2auth: remove unused parameters
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-09 15:39:44 -07:00
Gyuho Lee 8235982f6e etcdserver/api/v2http: remove unused parameters
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-09 15:39:44 -07:00
Gyuho Lee 807770740a etcdserver/api: add TODO to deprecate '/config/local/log'
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-04 21:30:23 -07:00
Gyuho Lee a32db53765 etcdserver: add details to structured logger
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-04 12:31:35 -07:00
Jiang Xuan bf432648ae *: make bcrypt-cost configurable 2018-05-03 11:43:32 -07:00
Gyuho Lee 47ab4e22d2 etcdserver: clarify errors from "openSnapshotBackend"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-02 14:17:22 -07:00
Gyuho Lee 5828efda38 *: use "zap.Field"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-02 13:26:39 -07:00
Gyuho Lee c60054abea etcdserver/api/etcdhttp: use structured logging in peer handler
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-02 12:53:28 -07:00
Gyuho Lee cccf77db9e etcdserver/api/v2http: support structured logging
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-02 11:57:23 -07:00
Gyuho Lee 3b38cb305f etcdserver/api/etcdhttp: support structured logging
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-02 11:57:23 -07:00
Gyuho Lee 98fcd67e9f etcdserver/v2auth: support structured logging
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-02 11:57:20 -07:00
Gyuho Lee 6cf3dae93e etcdserver/api/v3rpc: fix race in stream error logging
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-01 16:19:55 -07:00
Gyuho Lee 38326d002e etcdserver/v2store: use "zap" logger in v2v3 tests
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-01 16:15:09 -07:00
Gyuho Lee b3bd5887cd etcdserver: clarify server membership change logging
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-05-01 15:06:29 -07:00
Gyuho Lee 200401248a
Merge pull request #9665 from gyuho/unconvert
test: integrate github.com/mdempsky/unconvert
2018-05-01 09:52:44 -07:00
Gyuho Lee eae30a6c9b etcdserver: fix "unconvert" warnings
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-30 15:32:16 -07:00
Hitoshi Mitake a9225c164a etcdserver: not print password in the warning message of expensive request
Fix https://github.com/coreos/etcd/issues/9635
2018-04-30 13:44:27 -07:00
Gyuho Lee 30dd8a7dde etcdserver: support structured logging for auth
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-27 14:19:51 -07:00
Gyuho Lee 0e565c8960 etcdserver: use structured logging in "advertiseMatchesCluster"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-27 11:40:30 -07:00
Gyuho Lee f7f6fdeb52 etcdserver: support structured logger for discovery, compactor
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-26 14:36:22 -07:00
Gyuho Lee dca3f5e3ad
Merge pull request #9626 from yudai/fix_set_inexist_dir
etcdserver: Fix v2v3api set to create parent directly if not exists
2018-04-26 11:50:27 -07:00
Iwasaki Yudai 3c25465855 etcdserver: Fix v2v3api set to create parent directly if not exists
When a new file is created under an non existent directly,
the v2 API automatically create the parent directly.
This commit aligns the behaviour of v2v3 emulation to comply with the v2
API.
2018-04-25 17:36:59 -07:00
Gyuho Lee 3ea7a5d0bd etcdserver: add "LoggerCore" field for Raft logger
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-25 10:16:54 -07:00
Gyuho Lee 4f45f5d9dd
Merge pull request #9591 from gyuho/election
*: add --initial-election-tick-advance to configure election fast-forward on bootstrap
2018-04-23 10:17:49 -07:00
Gyuho Lee 83f7f174da etcdserver: print server configuration duration fields in string
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-20 13:11:11 -07:00
Gyuho Lee 5f8abdc227 etcdserver: log auto compaction on server start
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-20 11:58:36 -07:00
Gyuho Lee f205b22434 etcdserver: fix snapshot panic message
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-20 11:20:17 -07:00
Gyuho Lee 21d2e2ab6e etcdserver: add more tick fast-forward logs
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-19 17:45:23 -07:00
Gyuho Lee 4bec0d7d67 etcdserver: add "InitialElectionTickAdvance"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-19 17:45:23 -07:00
Gyuho Lee 8296ce0930 rafthttp: add/fix debugging lines
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-04-19 13:18:57 -07:00
Maciej Borsz 46bc966aa7 etcdserver: add is_leader prometheus metric that is 1 on the leader.
Before this change, we had now way to find a leader using /metrics
endpoint. This commit adds a metric to do that.
2018-04-19 11:47:40 +02:00