Commit Graph

101 Commits (71bba3c761b0078c81c2b39781ec74853c458303)

Author SHA1 Message Date
Marek Siarkowicz 7c35dadc25 server: Extract corruption detection to dedicated struct
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-06-13 18:19:24 +02:00
ahrtr 25deb436af fix the race condition between goroutine and channel on the same leases to be revoked 2022-05-25 16:44:41 +08:00
Piotr Tabor 85b18c9b3e Rename WrapApply to Apply. 2022-05-20 14:32:04 +02:00
Piotr Tabor 0da0cf4795 expose UberApplier as interface (not as implementation struct). 2022-05-20 14:32:04 +02:00
Piotr Tabor 5097b33ab9 Rename etcdserver/etcderrors package to etcdserver/errors. 2022-05-20 14:32:04 +02:00
Piotr Tabor 63b2f63cc1 Rename package alising "apply2" -> apply. 2022-05-20 14:32:04 +02:00
Piotr Tabor 47a771871b Move apply to its own package (no dependency on etcdserver). 2022-05-20 14:32:04 +02:00
Piotr Tabor fc6a6c3c27 Move etcdserver/errors.go to sepatate package to avoid cyclic dependencies. 2022-05-20 14:32:04 +02:00
Piotr Tabor b073129d03 Applier does not depend on EtcdServer any longer.
All the depencies are explicily passed to the UberApplier factory method.
2022-05-20 14:32:04 +02:00
Piotr Tabor 651de5a057 Rename EtcdServer.Id with EtcdServer.MemberId.
It was misleading and error prone vs. ClusterId.
2022-05-20 14:32:04 +02:00
Piotr Tabor b7ad746bfe Encapsulating applier logic: UberApplier coordinates all appliers for server
This PR:
 - moves wrapping of appliers (due to Alarms) out of server.go into uber_applier.go
 - clearly devides the application logic into: chain of:
     a) 'WrapApply' (generic logic across all the methods)
     b) dispatcher (translation of Apply into specific method like 'Put')
     c) chain of 'wrappers' of the specific methods (like Put).
 - when we do recovery (restore from snapshot) we create new instance of appliers.

The purpose is to make sure we control all the depencies of the apply process, i.e.
we can supply e.g. special instance of 'backend' to the application logic.
2022-05-20 14:32:04 +02:00
Piotr Tabor cdf9869d70 Encapsulation of applier logic: Move Txn related code out of applier.go.
The PR removes calls to applierV3base logic from server.go that is NOT part of 'application'.
The original idea was that read-only transaction and Range call shared logic with Apply,
so they can call appliers directly (but bypassing all 'corrupt', 'quota' and 'auth' wrappers).

This PR moves all the logic to a separate file (that later can become package on its own).
2022-05-20 14:32:04 +02:00
ahrtr e7f8bf7c44 enhance the /version endpoint to add storageVersion 2022-05-06 20:29:42 +08:00
Marek Siarkowicz f09da32f9d
Merge pull request #13655 from serathius/health
Cleanup healthcheck code after V2 removal
2022-05-06 12:08:36 +02:00
Marek Siarkowicz 600ee13ac0 server: Cover V3 health with tests 2022-05-05 09:52:14 +02:00
Marek Siarkowicz 0096d2ecdb server: Remove unused NewClientHandler 2022-05-05 09:52:13 +02:00
ahrtr fb2eeb9027 verify consistent_index in snapshot must be equal to the snapshot index
Usually the consistent_index should be greater than the index of the
latest snapshot with suffix .snap. But for the snapshot coming from the
leader, the consistent_index should be equal to the snapshot index.
2022-05-03 20:02:47 +08:00
ahrtr 6eef7ede40 Update conssitent_index when applying fails
When clients have no permission to perform whatever operation, then
the applying may fail. We should also move consistent_index forward
in this case, otherwise the consitent_index may smaller than the
snapshot index.
2022-04-20 21:44:48 +08:00
ahrtr 484d2f01f3 set backend to cindex before recovering the lessor in applySnapshot 2022-04-12 10:36:29 +08:00
ahrtr 4033f5c2b9 move the consistentIdx and consistentTerm from Etcdserver to cindex package
Removed the fields consistentIdx and consistentTerm from struct EtcdServer,
and added applyingIndex and applyingTerm into struct consistentIndex in
package cindex. We may remove the two fields completely if we decide to
remove the OnPreCommitUnsafe, and it will depend on the performance test
result.
2022-04-07 15:16:49 +08:00
ahrtr e155e50886 rename LockWithoutHook to LockOutsideApply and add LockInsideApply 2022-04-07 05:35:13 +08:00
ahrtr 47038593e9 set the consistent_index directly when applyV3 isn't performed 2022-04-07 05:35:13 +08:00
ahrtr 7ac995cdde enhanced authBackend to support authReadTx 2022-04-07 05:35:13 +08:00
ahrtr bfd5170f66 add a txPostLockHook into the backend
Previously the SetConsistentIndex() is called during the apply workflow,
but it's outside the db transaction. If a commit happens between SetConsistentIndex
and the following apply workflow, and etcd crashes for whatever reason right
after the commit, then etcd commits an incomplete transaction to db.
Eventually etcd runs into the data inconsistency issue.

In this commit, we move the SetConsistentIndex into a txPostLockHook, so
it will be executed inside the transaction lock.
2022-04-07 05:35:13 +08:00
ahrtr 836bd6bc3a fix WARNING: DATA RACE issue when multiple goroutines access the backend concurrently 2022-04-03 06:13:09 +08:00
ahrtr edce939f6e add one more field storageVersion into StatusResponse
When performing the downgrade operation, users can confirm whether each member
is ready to be downgraded using the field 'storageVersion'. If it's equal to the
'target version' in the downgrade command, then it's ready to be downgraded;
otherwise, the etcd member is still in progress of processing the db file.
2022-03-18 07:04:44 +08:00
Marek Siarkowicz a0f26ff4ea server: Snapshot after cluster version downgrade 2022-02-21 15:48:00 +01:00
Marek Siarkowicz 8c91d60a6f server: Switch to publishV3 2022-02-14 23:06:45 +01:00
AdamKorcz 0df768d2b1 server/etcdserver: fix oss-fuzz issue 42181 2022-02-14 10:59:41 +00:00
Marek Siarkowicz 692b3c4cd7 server: Remove most of V2 API 2022-01-25 15:24:13 +01:00
Marek Siarkowicz ee5ef42c5c server: --enable-v2 and --enable-v2v3 is decomissioned 2022-01-14 13:19:30 +01:00
Marek Siarkowicz 7d10899d7f server: Require either cluster version v3.6 or --experimental-enable-lease-checkpoint-persist to persist lease remainingTTL
To avoid inconsistant behavior during cluster upgrade we are feature
gating persistance behind cluster version. This should ensure that
all cluster members are upgraded to v3.6 before changing behavior.

To allow backporting this fix to v3.5 we are also introducing flag
--experimental-enable-lease-checkpoint-persist that will allow for
smooth upgrade in v3.5 clusters with this feature enabled.
2021-12-02 12:26:47 +01:00
Marek Siarkowicz e47c3c22d2 server: Move downgrade API logic into version package 2021-10-08 12:01:51 +02:00
Marek Siarkowicz 1e5e57f268 server: Move downgrade detection code to version package 2021-10-08 10:41:37 +02:00
Marek Siarkowicz 39f92a32ca server: Move member dir creation up and introduce Close method to bootstrap structs 2021-09-20 12:21:36 +02:00
Marek Siarkowicz 4884e7d8cf server: Move wal bootstrap from cluster to storage 2021-09-20 12:21:35 +02:00
Marek Siarkowicz 049e2d6ec0 server: Move raft up the bootstrap hierarchy 2021-09-20 12:20:19 +02:00
Marek Siarkowicz 8b0d8ea2af server: Move cluster up the bootstrap hierarchy 2021-09-20 12:19:09 +02:00
Marek Siarkowicz 7c8f7166e7 server: Move bootstraping backend from snapshot to bootstrapBackend 2021-09-20 12:17:33 +02:00
Marek Siarkowicz c97ab8f5e0 server: Move cluster up the bootstrap hierarchy 2021-09-20 12:07:41 +02:00
Marek Siarkowicz 6a4ea70aef server: Move clusterID and nodeID up the bootstrap hierarchy 2021-09-20 12:06:18 +02:00
Marek Siarkowicz db06a4ab28 server: Move wal bootstrap up the hierarchy 2021-09-20 12:04:44 +02:00
Marek Siarkowicz aa0c050003 etcdserver: Add more hierarchy bootstap introducing a separate storage bootstrap step 2021-09-20 12:01:45 +02:00
Marek Siarkowicz 66d05e5496 Try updating storage version immidietly after cluster version is set 2021-09-10 10:16:48 +02:00
Marek Siarkowicz ff3729c4d5 server: Implement storage schema migration to follow cluster version change and panic if unknown storage version is found
Storage version should follow cluster version. During upgrades this
should be immidiate as storage version can be always upgraded as storage
is backward compatible. During downgrades it will be delayed and will
require time for incompatible changes to be snapshotted.

As storage version change can happen long after cluster is running, we
need to add a step during bootstrap to validate if loaded data can be
understood by migrator.
2021-09-10 10:16:48 +02:00
Marek Siarkowicz 9d81dde082 server: Extract notifier struct 2021-09-10 10:16:48 +02:00
Eduardo Patrocinio 87f1dc7e40 Fix a few typos 2021-09-03 16:09:09 -04:00
Marek Siarkowicz 83a325ac46 server: Move all functions needed for storage bootstrap to storage package
This is prerequestite to move storage bootstrap, splitted to separate PR
to make it easier to review.
2021-08-03 13:09:15 +02:00
Marek Siarkowicz a0554a6bd3 etcdserver: Create AuthBackend interface 2021-07-20 18:09:53 +02:00
Marek Siarkowicz 6cd3633543 etcdserver: Rename membershipStore to membershipBackend 2021-07-20 17:56:52 +02:00