Compare commits

...

7 Commits

Author SHA1 Message Date
Gyu-Ho Lee
e211fb6de3 version: bump up to 3.2.8
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-09-26 02:41:18 +09:00
Gyu-Ho Lee
fb7e274309 Documentation/op-guide: remove grafana demo link
The dashboard was removed during Tectonic migration
in AWS, while the Grafana still runs in GCP.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-09-26 02:40:59 +09:00
beth wright
4a61fcf42d docs: remove link-breaking space 2017-09-20 08:11:02 +09:00
Anthony Romano
4c8fa30dda e2e: test no value is returned in TestCtlV3GetKeysOnly
Test was checking key name is returned, but was not correctly checking
no value is returned.
2017-09-14 04:42:06 +09:00
Anthony Romano
01c4f35b30 grpcproxy: respect KeysOnly flag
Fixes #8478
2017-09-14 04:41:58 +09:00
Anthony Romano
15e9510d2c client: fail over to next endpoint on oneshot failure
Fixes #8515
2017-09-08 13:28:55 -07:00
Gyu-Ho Lee
09b7fd4975 version: bump up to 3.2.7+git
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-09-01 14:03:26 -07:00
7 changed files with 95 additions and 36 deletions

View File

@@ -1,6 +1,50 @@
# Monitoring etcd
Each etcd server exports metrics under the `/metrics` path on its client port.
Each etcd server provides local monitoring information on its client port through http endpoints. The monitoring data is useful for both system health checking and cluster debugging.
## Debug endpoint
If `--debug` is set, the etcd server exports debugging information on its client port under the `/debug` path. Take care when setting `--debug`, since there will be degraded performance and verbose logging.
The `/debug/pprof` endpoint is the standard go runtime profiling endpoint. This can be used to profile CPU, heap, mutex, and goroutine utilization. For example, here `go tool pprof` gets the top 10 functions where etcd spends its time:
```sh
$ go tool pprof http://localhost:2379/debug/pprof/profile
Fetching profile from http://localhost:2379/debug/pprof/profile
Please wait... (30s)
Saved profile in /home/etcd/pprof/pprof.etcd.localhost:2379.samples.cpu.001.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
310ms of 480ms total (64.58%)
Showing top 10 nodes out of 157 (cum >= 10ms)
flat flat% sum% cum cum%
130ms 27.08% 27.08% 130ms 27.08% runtime.futex
70ms 14.58% 41.67% 70ms 14.58% syscall.Syscall
20ms 4.17% 45.83% 20ms 4.17% github.com/coreos/etcd/cmd/vendor/golang.org/x/net/http2/hpack.huffmanDecode
20ms 4.17% 50.00% 30ms 6.25% runtime.pcvalue
20ms 4.17% 54.17% 50ms 10.42% runtime.schedule
10ms 2.08% 56.25% 10ms 2.08% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).AuthInfoFromCtx
10ms 2.08% 58.33% 10ms 2.08% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).Lead
10ms 2.08% 60.42% 10ms 2.08% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/wait.(*timeList).Trigger
10ms 2.08% 62.50% 10ms 2.08% github.com/coreos/etcd/cmd/vendor/github.com/prometheus/client_golang/prometheus.(*MetricVec).hashLabelValues
10ms 2.08% 64.58% 10ms 2.08% github.com/coreos/etcd/cmd/vendor/golang.org/x/net/http2.(*Framer).WriteHeaders
```
The `/debug/requests` endpoint gives gRPC traces and performance statistics through a web browser. For example, here is a `Range` request for the key `abc`:
```
When Elapsed (s)
2017/08/18 17:34:51.999317 0.000244 /etcdserverpb.KV/Range
17:34:51.999382 . 65 ... RPC: from 127.0.0.1:47204 deadline:4.999377747s
17:34:51.999395 . 13 ... recv: key:"abc"
17:34:51.999499 . 104 ... OK
17:34:51.999535 . 36 ... sent: header:<cluster_id:14841639068965178418 member_id:10276657743932975437 revision:15 raft_term:17 > kvs:<key:"abc" create_revision:6 mod_revision:14 version:9 value:"asda" > count:1
```
## Metrics endpoint
Each etcd server exports metrics under the `/metrics` path on its client port and optionally on interfaces given by `--listen-metrics-urls`.
>>>>>>> 607d0762e... Documentation/op-guide: remove grafana demo link
The metrics can be fetched with `curl`:
@@ -75,8 +119,6 @@ Access: proxy
Then import the default [etcd dashboard template][template] and customize. For instance, if Prometheus data source name is `my-etcd`, the `datasource` field values in JSON also need to be `my-etcd`.
See the [demo][demo].
Sample dashboard:
![](./etcd-sample-grafana.png)
@@ -85,4 +127,3 @@ Sample dashboard:
[prometheus]: https://prometheus.io/
[grafana]: http://grafana.org/
[template]: ./grafana.json
[demo]: http://dash.etcd.io/dashboard/db/test-etcd

View File

@@ -6,7 +6,7 @@ This guide assumes operational knowledge of Amazon Web Services (AWS), specifica
As a critical building block for distributed systems it is crucial to perform adequate capacity planning in order to support the intended cluster workload. As a highly available and strongly consistent data store increasing the number of nodes in an etcd cluster will generally affect performance adversely. This makes sense intuitively, as more nodes means more members for the leader to coordinate state across. The most direct way to increase throughput and decrease latency of an etcd cluster is allocate more disk I/O, network I/O, CPU, and memory to cluster members. In the event it is impossible to temporarily divert incoming requests to the cluster, scaling the EC2 instances which comprise the etcd cluster members one at a time may improve performance. It is, however, best to avoid bottlenecks through capacity planning.
The etcd team has produced a [hardware recommendation guide]( ../op-guide/hardware.md) which is very useful for “ballparking” how many nodes and what instance type are necessary for a cluster.
The etcd team has produced a [hardware recommendation guide](../op-guide/hardware.md) which is very useful for “ballparking” how many nodes and what instance type are necessary for a cluster.
AWS provides a service for creating groups of EC2 instances which are dynamically sized to match load on the instances. Using an Auto Scaling Group ([ASG](http://docs.aws.amazon.com/autoscaling/latest/userguide/AutoScalingGroup.html)) to dynamically scale an etcd cluster is not recommended for several reasons including:

View File

@@ -372,12 +372,7 @@ func (c *httpClusterClient) Do(ctx context.Context, act httpAction) (*http.Respo
if err == context.Canceled || err == context.DeadlineExceeded {
return nil, nil, err
}
if isOneShot {
return nil, nil, err
}
continue
}
if resp.StatusCode/100 == 5 {
} else if resp.StatusCode/100 == 5 {
switch resp.StatusCode {
case http.StatusInternalServerError, http.StatusServiceUnavailable:
// TODO: make sure this is a no leader response
@@ -385,10 +380,16 @@ func (c *httpClusterClient) Do(ctx context.Context, act httpAction) (*http.Respo
default:
cerr.Errors = append(cerr.Errors, fmt.Errorf("client: etcd member %s returns server error [%s]", eps[k].String(), http.StatusText(resp.StatusCode)))
}
if isOneShot {
return nil, nil, cerr.Errors[0]
err = cerr.Errors[0]
}
if err != nil {
if !isOneShot {
continue
}
continue
c.Lock()
c.pinned = (k + 1) % leps
c.Unlock()
return nil, nil, err
}
if k != pinned {
c.Lock()

View File

@@ -16,6 +16,7 @@ package client
import (
"errors"
"fmt"
"io"
"io/ioutil"
"math/rand"
@@ -304,7 +305,9 @@ func TestHTTPClusterClientDo(t *testing.T) {
fakeErr := errors.New("fake!")
fakeURL := url.URL{}
tests := []struct {
client *httpClusterClient
client *httpClusterClient
ctx context.Context
wantCode int
wantErr error
wantPinned int
@@ -395,10 +398,30 @@ func TestHTTPClusterClientDo(t *testing.T) {
wantCode: http.StatusTeapot,
wantPinned: 1,
},
// 500-level errors cause one shot Do to fallthrough to next endpoint
{
client: &httpClusterClient{
endpoints: []url.URL{fakeURL, fakeURL},
clientFactory: newStaticHTTPClientFactory(
[]staticHTTPResponse{
{resp: http.Response{StatusCode: http.StatusBadGateway}},
{resp: http.Response{StatusCode: http.StatusTeapot}},
},
),
rand: rand.New(rand.NewSource(0)),
},
ctx: context.WithValue(context.Background(), &oneShotCtxValue, &oneShotCtxValue),
wantErr: fmt.Errorf("client: etcd member returns server error [Bad Gateway]"),
wantPinned: 1,
},
}
for i, tt := range tests {
resp, _, err := tt.client.Do(context.Background(), nil)
if tt.ctx == nil {
tt.ctx = context.Background()
}
resp, _, err := tt.client.Do(tt.ctx, nil)
if !reflect.DeepEqual(tt.wantErr, err) {
t.Errorf("#%d: got err=%v, want=%v", i, err, tt.wantErr)
continue
@@ -407,11 +430,9 @@ func TestHTTPClusterClientDo(t *testing.T) {
if resp == nil {
if tt.wantCode != 0 {
t.Errorf("#%d: resp is nil, want=%d", i, tt.wantCode)
continue
}
continue
}
if resp.StatusCode != tt.wantCode {
} else if resp.StatusCode != tt.wantCode {
t.Errorf("#%d: resp code=%d, want=%d", i, resp.StatusCode, tt.wantCode)
continue
}

View File

@@ -198,21 +198,15 @@ func getRevTest(cx ctlCtx) {
}
func getKeysOnlyTest(cx ctlCtx) {
var (
kvs = []kv{{"key1", "val1"}}
)
for i := range kvs {
if err := ctlV3Put(cx, kvs[i].key, kvs[i].val, ""); err != nil {
cx.t.Fatalf("getKeysOnlyTest #%d: ctlV3Put error (%v)", i, err)
}
if err := ctlV3Put(cx, "key", "val", ""); err != nil {
cx.t.Fatal(err)
}
cmdArgs := append(cx.PrefixArgs(), "get")
cmdArgs = append(cmdArgs, []string{"--prefix", "--keys-only", "key"}...)
err := spawnWithExpects(cmdArgs, []string{"key1", ""}...)
if err != nil {
cx.t.Fatalf("getKeysOnlyTest : error (%v)", err)
cmdArgs := append(cx.PrefixArgs(), []string{"get", "--keys-only", "key"}...)
if err := spawnWithExpect(cmdArgs, "key"); err != nil {
cx.t.Fatal(err)
}
if err := spawnWithExpects(cmdArgs, "val"); err == nil {
cx.t.Fatalf("got value but passed --keys-only")
}
}

View File

@@ -189,7 +189,9 @@ func RangeRequestToOp(r *pb.RangeRequest) clientv3.Op {
if r.CountOnly {
opts = append(opts, clientv3.WithCountOnly())
}
if r.KeysOnly {
opts = append(opts, clientv3.WithKeysOnly())
}
if r.Serializable {
opts = append(opts, clientv3.WithSerializable())
}

View File

@@ -26,7 +26,7 @@ import (
var (
// MinClusterVersion is the min cluster version this etcd binary is compatible with.
MinClusterVersion = "3.0.0"
Version = "3.2.7"
Version = "3.2.8"
APIVersion = "unknown"
// Git SHA Value will be set during build