Merge pull request #6843 from gyuho/docs

Documentation/op-guide: add 'monitoring' guide
2016-11-11 16:08:32 -08:00 · 2016-11-11 16:08:32 -08:00 · 70fd684843
parent 3c97e7a475 6d83590434
commit 70fd684843
3 changed files with 1089 additions and 1 deletions
--- a/Documentation/docs.md
+++ b/Documentation/docs.md
@ -28,7 +28,7 @@ Administrators who need to create reliable and scalable key-value stores for the
 - [Run etcd clusters inside containers][container]
 - [Configuration][conf]
 - [Security][security]
- - Monitoring
+ - [Monitoring][monitoring]
 - [Maintenance][maintenance]
 - [Understand failures][failures]
 - [Disaster recovery][recovery]
@ -72,6 +72,7 @@ To learn more about the concepts and internals behind etcd, read the following p
 [recovery]: op-guide/recovery.md
 [maintenance]: op-guide/maintenance.md
 [security]: op-guide/security.md
+[monitoring]: op-guide/monitoring.md
 [v2_migration]: op-guide/v2-migration.md
 [container]: op-guide/container.md
 [understand_apis]: learning/api.md
--- a/Documentation/op-guide/grafana.json
+++ b/Documentation/op-guide/grafana.json
--- a/Documentation/op-guide/monitoring.md
+++ b/Documentation/op-guide/monitoring.md
@ -0,0 +1,75 @@
+# Monitoring etcd
+
+Each etcd server exports metrics under the `/metrics` path on its client port.
+
+The metrics can be fetched with `curl`:
+
+```sh
+$ curl -L http://localhost:2379/metrics
+
+# HELP etcd_debugging_mvcc_keys_total Total number of keys.
+# TYPE etcd_debugging_mvcc_keys_total gauge
+etcd_debugging_mvcc_keys_total 0
+# HELP etcd_debugging_mvcc_pending_events_total Total number of pending events to be sent.
+# TYPE etcd_debugging_mvcc_pending_events_total gauge
+etcd_debugging_mvcc_pending_events_total 0
+...
+```
+
+
+## Prometheus
+
+Running a [Prometheus][prometheus] monitoring service is the easiest way to ingest and record etcd's metrics.
+
+First, install Prometheus:
+
+```sh
+PROMETHEUS_VERSION="1.3.1"
+wget https://github.com/prometheus/prometheus/releases/download/v$PROMETHEUS_VERSION/prometheus-$PROMETHEUS_VERSION.linux-amd64.tar.gz -O /tmp/prometheus-$PROMETHEUS_VERSION.linux-amd64.tar.gz
+tar -xvzf /tmp/prometheus-$PROMETHEUS_VERSION.linux-amd64.tar.gz --directory /tmp/ --strip-components=1
+/tmp/prometheus -version
+```
+
+Set Prometheus's scraper to target the etcd cluster endpoints:
+
+```sh
+cat > /tmp/test-etcd.yaml <<EOF
+global:
+  scrape_interval: 10s
+scrape_configs:
+  - job_name: test-etcd
+    static_configs:
+    - targets: ['10.240.0.32:2379','10.240.0.33:2379','10.240.0.34:2379']
+EOF
+cat /tmp/test-etcd.yaml
+```
+
+Set up the Prometheus handler:
+
+```sh
+nohup /tmp/prometheus \
+    -config.file /tmp/test-etcd.yaml \
+    -web.listen-address ":9090" \
+    -storage.local.path "test-etcd.data" >> /tmp/test-etcd.log  2>&1 &
+```
+
+Now Prometheus will scrape etcd metrics every 10 seconds.
+
+
+## Grafana
+
+[Grafana][grafana] has built-in Prometheus support; just add a Prometheus data source:
+
+```
+Name:   test-etcd
+Type:   Prometheus
+Url:    http://localhost:9090
+Access: proxy
+```
+
+Then import the default [etcd dashboard template][template] and customize; see the [demo][demo].
+
+[prometheus]: https://prometheus.io/
+[grafana]: http://grafana.org/
+[template]: ./grafana.json
+[demo]: http://dash.etcd.io/dashboard/db/test-etcd