History

Clayton Coleman 465592a718 Documentation/etcd-mixin: Add an alert for down etcd members An etcd member being down is an important failure state - while normal admin operations may cause transient outages to rotate, when any member is down the cluster is operating in a degraded fashion. Add an alert that records when any members are down so that administrators know whether the next failure is fatal. The rule is more complicated than `up{...} == 0` because not all failure modes for etcd may have an `up{...}` entry for each member. For instance, a Kubernetes service in front of an etcd cluster might only have 2 endpoints recorded in `up` because the third pod is evicted by the kubelet - the cluster is degraded but `count(up{...})` would not return the full quorum size. Instead, use network peer send failures as a failure detector and attempt to return the max of down services or failing peers. We may undercount the number of total failures, but we will at least alert that a member is down.		2019-07-30 14:39:50 -04:00
..
README.md	Documentation/etcd-mixin: Fix EtcdInsufficientMembers alerting	2018-10-15 19:23:43 +02:00
mixin.libsonnet	Documentation/etcd-mixin: Add an alert for down etcd members	2019-07-30 14:39:50 -04:00
test.yaml	Documentation/etcd-mixin: Add an alert for down etcd members	2019-07-30 14:39:50 -04:00

Prometheus Monitoring Mixin for etcd

NOTE: This project is alpha stage. Flags, configuration, behaviour and design may change significantly in following releases.

A set of customisable Prometheus alerts for etcd.

Instructions for use are the same as the kubernetes-mixin.

Background

Make sure to have jsonnet and gojsontoyaml installed.

First compile the mixin to a YAML file, which the promtool will read:

jsonnet -e '(import "mixin.libsonnet").prometheusAlerts' | gojsontoyaml > mixin.yaml

Then run the unit test:

promtool test rules test.yaml