vitalif/etcd - etcd

Commit Graph

Author	SHA1	Message	Date
Dan Mace	cd3df73944	Documentation: Further improve etcdMembersDown alert Before this change, the default window for the etcdMembersDown network failure rate function was recently changed to 1 minute. While this helps detect a etcd recovery more quickly, it depends on scrape intervals of <= 15s to collect sufficient data points for the rate function. In practice, an interval of >= 30s is more typical, which causes the rate function to be less accurate. This patch increases the window to 2m, which is a compromise between the original value of 3m and the 1m change introuced with `2aa5684`, and should accomodate more typical scrape intervals. To offset the window change and to further improve the chance that the alert will only fire when etcd is truly dead, this patch changes the `for` clause from 3m to 10m. The rationale is as follows: 1. There can be significant variance in durations following a reboot before etcd is scraped and detected as available. 2. A conservative trigger like 10m seems less likely to produce a false alarm in the face of such variance. 3. In this alerting situation, if the outage is real, it seems unlikely that an additional 7 minutes of delay before (for example) paging somebody will make a significant impact on the overall response.	2020-07-31 09:26:46 -04:00
Boqin Qin	9006d8d4f9	Documentation/learning/lock/client: Add defer Unlock (#11802 )	2020-07-26 11:22:19 -07:00
Björn Rabenstein	c9a5889915	Documentation/etcd-mixin: Reformulate alerting rules to use `without` rather than `by` (#12122 ) * etcd-mixin: Reformulate alerting rules to use `without` rather than `by` With aggregations using `by`, all additional target labels that a user might have configured, are aggregated away. However, those target labels are useful for e.g. alert routing. With this commit, nothing should change for vanilla job/instance target labels, but whoever has more target labels can now still make use of them. Signed-off-by: beorn7 <beorn@grafana.com> * etcd-mixin: Parametrize instance labels to aggregate away Signed-off-by: beorn7 <beorn@grafana.com>	2020-07-23 16:02:26 -07:00
Sahdev Zala	ef866a6d8b	Merge pull request #11943 from mitake/bcrypt-in-api auth, etcdserver: hash password in the API layer	2020-07-20 10:52:24 -04:00
Hitoshi Mitake	2c41d9960b	Documentation: describe the change of WAL entries related to auth	2020-07-14 00:15:19 +09:00
Hitoshi Mitake	5a3da48cdf	auth, etcdserver: hash password in the API layer	2020-07-14 00:15:19 +09:00
Dan Mace	2aa5684ada	Documentation: Tweak etcdMembersDown to reduce false negatives Before this change, during a reboot in which etcd recovers quickly (e.g. 1 min), the etcdMembersDown alert tends to fire even when etcd is fully healthy because the averaging function can take more than 3 minutes to average back down below the 0.01 threshold. This change tries to reduce the possibility of a false negative by considering a shorter (1 min) failure rate window which tends to average down below the threshold far more quickly (within 1 min). The `for` clause of the alert should ensure that the alert still fires if the poor conditions are sustained for an unreasonable overall time (3 min).	2020-07-13 08:58:21 -04:00
W. Trevor King	4160b8396d	Documentation/op-guide: Drop old alert_rules Frederic says [1]: > Side note, we can probably remove the old alerting syntax rules, > Prometheus has removed this syntax >2.5 years ago. [1]: https://github.com/etcd-io/etcd/pull/12080#issuecomment-649982787	2020-07-08 09:37:34 -07:00
Sam Batschelet	429826b467	Merge pull request #12080 from wking/raise-etcd-leader-changes-to-four Documentation/etcd-mixin: Raise etcdHighNumberOfLeaderChanges threshold to 4	2020-07-08 08:37:50 -04:00
Hitoshi Mitake	e582d7dc80	Documentation: refine the description about password strength	2020-06-29 23:40:44 +09:00
W. Trevor King	0c5cffc60b	Documentation/etcd-mixin: Raise etcdHighNumberOfLeaderChanges threshold to 4 A cluster with three members could see three leader changes during a healthy rolling reboot, and we don't want to alert on that. Growing to 4 reduces false-alarms for clusters with three or fewer members, and that's probably most clusters. It will also slightly increase the risk of false-negatives, but if the cluster is struggling with high latency, it seems likely that it would quickly pass the new threshold too. The hard-coded threshold means that we are still likely to get false-positives during rolling reboots of clusters with four or more members. Ideally we'd scale this with the cluster size, or something, but I'm not sure how to do that. Three members is the minimum size for high availability, so reducing false positives for that case seems worth addressing even if we leave larger clusters largely unchanges. Also manually catch etcd3_alert.rules up to speed, since it seems to have been passed over by `16fc8a2b4b` (Documentation/op-guide: Re-generate alert rules and dashboard from mixin, 2020-04-07, #11768).	2020-06-25 15:38:15 -07:00
Xiang Li	beb5614aad	doc: add TLS related warnings (#12060 )	2020-06-23 21:07:36 -07:00
CFC4N	10cdabe721	CHANGELOG: update for https://github.com/etcd-io/etcd/pull/11980 , https://github.com/etcd-io/etcd/pull/11986 , https://github.com/etcd-io/etcd/pull/11987 .	2020-06-21 00:00:41 +08:00
Hitoshi Mitake	c13415c581	Documentation: note on data encryption	2020-06-15 00:29:32 +09:00
hwdef	e3014072ba	Documentation: fix broken links	2020-06-12 09:51:33 +08:00
Ankur Gargi	b0d2edfc68	Documentation: Added Recover cluster from minority failure (#11988 )	2020-06-10 14:36:44 -07:00
shawwang	f3deba09b4	CHANGELOG: update 3.2 changelog and 3.3 upgrade document for #11691	2020-06-09 11:39:46 +08:00
Hitoshi Mitake	356f647866	Documentation: note on the policy of insecure by default	2020-05-05 22:44:24 +09:00
Brandon Philips	1044a8b07c	Merge pull request #11768 from brancz/uid Use UID instead of ID in Grafana dashboard	2020-04-29 05:35:06 -07:00
Brandon Philips	d88d765ba4	Documentation, CHANGELOG: use new go.etcd.io/etcd/v3 pkg Use the new package path in the docs and announce it in the CHANGELOG	2020-04-28 22:02:19 +00:00
Brandon Philips	96cce208c2	go.mod: use go.etcd.io/etcd/v3 versioning This change makes the etcd package compatible with the existing Go ecosystem for module versioning. Used this tool to update package imports: https://github.com/KSubedi/gomove	2020-04-28 00:57:35 +00:00
Hitoshi Mitake	2369cb3678	Documentation: note on password strength (#11796 )	2020-04-22 15:50:29 -07:00
Frederic Branczyk	16fc8a2b4b	Documentation/op-guide: Re-generate alert rules and dashboard from mixin	2020-04-07 18:15:02 +02:00
Frederic Branczyk	2c4877064e	Documentation/etcd-mixin: Use etcd_mvcc_db_total_size_in_bytes metric	2020-04-07 18:14:23 +02:00
Frederic Branczyk	68c5f6066f	Documentation/etcd-mixin: Set unique UID for Grafana dashboard	2020-04-07 18:13:41 +02:00
tangcong	a18cbc8a22	Documentation: add note for #11689 (#11759 )	2020-04-06 10:16:47 -07:00
jingyih	0344b70906	*: make MemberList linearizable - Add linearizable field to etcdserverpb.MemberListRequest. - Change behavior of clienv3 MemberList API. Now it is served with linearizable guarantee.	2020-03-25 20:16:20 -07:00
yoyinzyc	d8b9b54348	etcdserver: add downgrade rpc proto api.	2020-03-20 17:37:26 -07:00
Hitoshi Mitake	6d1982efe8	Merge pull request #11659 from wswcfan/add-auth-revision-status etcdserver: add auth revision to AuthStatus to improve observability and testability	2020-03-09 23:08:55 +09:00
Xiang Li	e0ff5ca318	RFC Documentation: enhance description of lock and lease (#11490 ) * Documentation: enhance description of lock and lease * Documentation: an executable implementation of fencing * docs: api guarantees cleanup lease grammar slightly * docs: learning/lock/README.md improve grammar Co-Authored-By: Steven E. Harris <seh@panix.com> * docs: learning: improve locks and leases grammar Co-authored-by: Brandon Philips <brandon@ifup.org> Co-authored-by: Steven E. Harris <seh@panix.com>	2020-03-05 10:31:47 -08:00
shawwang	15eeb2c4ae	etcdserver: add auth revision to AuthStatus to improve observability and testability	2020-03-04 22:37:24 +08:00
shawwang	c6fce8c320	Documentation: generate *.swagger.json using latest protoc-gen-swagger	2020-03-04 22:36:13 +08:00
Jacky Wu	4e5314e9b5	doc: remove out-date introduction video link. (#11601 ) It's easy to find etcd introduction video, and the introduction video from the rfc doc is outdated, so removing this link. Fixes 11591.	2020-02-07 20:49:05 -08:00
Vern Burton	071e70cdc4	*: add a new API and command for checking auth status (#11536 ) This changes have started at etcdctl under auth.go, and make changes to stub out everything down into the internal raft. Made changes to the .proto files and regenerated them so that the local version would build successfully.	2020-02-05 19:27:42 -08:00
Luc Perkins	b9d00aae7c	Documentation: add section headings to integrations doc (#11573 ) Signed-off-by: lucperkins <lucperkins@gmail.com>	2020-01-31 17:02:08 -08:00
Sahdev Zala	3898452b54	Merge pull request #11412 from lucperkins/lperkins/docs-restructuring-v2 Restructure documentation source files	2020-01-27 18:15:56 -05:00
Jingyi Hu	342c2464ae	Documentation: specify starting revision (#11559 )	2020-01-27 10:18:27 -08:00
Hitoshi Mitake	23810ea285	Documentation: unify the explanation of isolation level and consistency (#11474 )	2020-01-27 10:17:38 -08:00
lucperkins	1be2f4b8e2	Documentation: Restructure directory to accommodate new site generation system Signed-off-by: lucperkins <lucperkins@gmail.com>	2020-01-21 14:29:54 -08:00
Sahdev P. Zala	0cfadaaaeb	doc: update required go version for master changelog and readme are already updated.	2020-01-16 16:05:46 -05:00
Gabi Davar	8223006a97	Documentation: added v3.4.x metrics docs	2019-12-15 14:13:36 +02:00
Clayton Coleman	322c38e169	Documentation/etcd-mixin: Fix etcdHighNumberOfLeaderChanges (#11448 ) The `etcdHighNumberOfLeaderChanges` alert had a copy and paste error when it was converted from docs to mixin in 10244 - we moved from "increase over 15m > 3" to "rate over 15m > 3" which is not the same (rate is measured per second, so it should have been "rate over 15m > (3 / 60 / 15)"). As part of fixing that, we need to capture when prometheus starts or when new etcd clusters are captured with a high leader change - i.e. if you start a new etcd cluster and at the moment prometheus first scrapes you are already at 5 leader changes, we should fire on that transition. This alert is also now more responsive, so if you get a quick burst of 3 leader changes we'll alert within 5m rather than 15m.	2019-12-13 16:00:11 -08:00
Tamas Geschitz	a0571166bc	feat: changed ETCD manager URL It now points to our domain instead of the Github page.	2019-11-19 22:17:29 +01:00
Sahdev P. Zala	d185a54cb4	doc: update file ref path Update the adopters file path.	2019-10-17 20:34:24 -04:00
Sahdev P. Zala	d73e04efd9	doc: move production users to a standard ADOPTERS file The details of production users fits better in the standard ADOPTERS file as used by many other CNCF projects like CoreDNS, containerd etc.	2019-10-17 18:36:28 -04:00
宇慕	f62ea1ceca	*: promote the boltdb-freelistType from experimental to official and set default type to hashmap	2019-10-17 15:40:38 +08:00
Sahdev P. Zala	9002c1951f	doc: add lease time The current lease time is short and as such can lead to a timeout error as explained in the related issue which can be confusing. Fixes #9726	2019-10-13 16:38:28 -04:00
Swapnil Mhamane	e5aecf8678	Documentation: Add gardener/etcd-backup-restore to the tools list Etcd-backup-restore is collection of components to backup and restore the etcd. It features the periodic full and incremental backups, automated restore, Validation of etcd data directory with multi cloud provider support.	2019-10-10 21:18:41 +05:30
Jingyi Hu	20acacdea5	doc: clarify metrics flag	2019-09-24 15:27:46 -07:00
Sahdev Zala	93ae5d2f5b	Merge pull request #11095 from KeepCaim/master Documentation:fix clerical error	2019-08-30 09:54:38 -04:00

1 2 3 4 5 ...

1378 Commits (8c44d25f2a38be42f3e9005104cf2e3b688036ce)