etcd/Documentation
Dan Mace cd3df73944 Documentation: Further improve etcdMembersDown alert
Before this change, the default window for the etcdMembersDown network failure
rate function was recently changed to 1 minute. While this helps detect a etcd
recovery more quickly, it depends on scrape intervals of <= 15s to collect
sufficient data points for the rate function. In practice, an interval of >= 30s
is more typical, which causes the rate function to be less accurate.

This patch increases the window to 2m, which is a compromise between the
original value of 3m and the 1m change introuced with 2aa5684, and should
accomodate more typical scrape intervals.

To offset the window change and to further improve the chance that the alert
will only fire when etcd is truly dead, this patch changes the `for` clause from
3m to 10m. The rationale is as follows:

1. There can be significant variance in durations following a reboot before etcd
is scraped and detected as available.

2. A conservative trigger like 10m seems less likely to produce a false alarm in
the face of such variance.

3. In this alerting situation, if the outage is real, it seems unlikely that an
additional 7 minutes of delay before (for example) paging somebody will make a
significant impact on the overall response.
2020-07-31 09:26:46 -04:00
..
benchmarks Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
dev-guide auth, etcdserver: hash password in the API layer 2020-07-14 00:15:19 +09:00
dev-internal Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
etcd-mixin Documentation: Further improve etcdMembersDown alert 2020-07-31 09:26:46 -04:00
learning Documentation/learning/lock/client: Add defer Unlock (#11802) 2020-07-26 11:22:19 -07:00
metrics Documentation: added v3.4.x metrics docs 2019-12-15 14:13:36 +02:00
op-guide Documentation/op-guide: Drop old alert_rules 2020-07-08 09:37:34 -07:00
platforms Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
rfc doc: remove out-date introduction video link. (#11601) 2020-02-07 20:49:05 -08:00
triage Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
upgrades Documentation: describe the change of WAL entries related to auth 2020-07-14 00:15:19 +09:00
README.md Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
_index.md Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
branch-management.md Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
demo.md Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
dl-build.md Documentation, CHANGELOG: use new go.etcd.io/etcd/v3 pkg 2020-04-28 22:02:19 +00:00
faq.md Documentation: fix broken links 2020-06-12 09:51:33 +08:00
integrations.md Documentation: add section headings to integrations doc (#11573) 2020-01-31 17:02:08 -08:00
metrics.md Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
reporting-bugs.md Documentation: Restructure directory to accommodate new site generation system 2020-01-21 14:29:54 -08:00
tuning.md documentation: initial metadata additions for website generation (#10596) 2019-04-01 13:57:24 -07:00

README.md

The etcd documentation

etcd is a distributed key-value store designed to reliably and quickly preserve and provide access to critical data. It enables reliable distributed coordination through distributed locking, leader elections, and write barriers. An etcd cluster is intended for high availability and permanent data storage and retrieval.

Please note that the files in this directory are source files for the built and rendered documentation that can be viewed at etcd.io/docs.