This adds map operation tests for unsafeAddWatching, which
could have been failed without https://github.com/coreos/etcd/pull/3939.
It tests if unsafeAddWatching is correctly updating synced map.
One watcher includes multiple watchings, and their events are
sent out through one channel. For the received event, user would like to
know which watching it belongs to.
Introduce a watch ID. When watching on some key, user will get a watch
ID. The watch ID is attached to all events that is observed by this
watch.
This is for coreos#3859 switching slice to map for synced watchings.
For a large amount of synced watchings, map implementation performs better.
When putting 1 million watchers on the same key and canceling them one by
one: original implementation takes 9m7.268221091s, while the one with map
takes only 430.531637ms.
The point is to decouple the key-value storage layer and the
event notification layer clearly. It gives the watchableKV the
flexibility to define whatever event structure it wants without
breaking the ondisk format at key-value storage layer.
Changes:
1. change the format of key and value stored in backend
Store KeyValue struct instead of Event struct in backend value for
better abstraction as xiang suggests. And record the corresponded
action in the backend key.
2. Remove word 'event' from functions
This adds build constraints in order to pass memory-map flags to bolt.Option.
If backend package passes syscall.MAP_POPULATE flag, the boltdb does read-ahead
which speeds up entire-database read, which then leads to faster storage
Restore. Benchmark result shows for 4GB, it opens and loads 6x faster in SSD,
and 12x faster in HDD. For 2GB, 1.6x faster with MAP_POPULATE in SSD.
When getting the watched events, it iterate all keys in putm and delm
to generate the events. If we don't delete the key from putm/delm,
it would range on the key that is not actually put or deleted. This is
incorrect.
Fix the panic that happens when single put/delete is watched.
consistentWatchableStore maintains an index that is always consistent
with the latest txn. The index could be used to indicate the progress
of the store so far when recovery.
We need to access the underlying store to use its RangeEvents function.
It is not good to use unnecessary type conversion.
The underlying store is also needed for further store upon
watchableStore.
One txn is treated as atomic, and might contain multiple Put/Delete/Range
operations. For now, between these operations, we might call forecCommit
to sync the change to disk, or backend may commit it in background.
Thus the snapshot state might contains an unfinished multiple objects
transaction, which is dangerous if database is restored from the snapshot.
This PR makes KV txn hold batchTx lock during the process and avoids
commit to happen.
When using Snapshot function, it is expected:
1. know the size of snapshot before writing data
2. split snapshot-ready phase and write-data phase. so we could cut
snapshot first and write data later.
Update its interface to fit the requirement of etcdserver.
New PR from https://github.com/coreos/etcd/pull/3575.
This add `defer` to `s.mu`. Current code does not `Unlock`
in the correct scope, I think.
(Sorry, I accidentally deleted my fork so the changes
might not sound continuous from my previous pull requests.)
Change to the function:
1. specify the meaning of startRev and endRev parameters
2. specify the meaning of returned nextRev
Moreover, it adds unit tests for the function.
It should continue to skip following operations.
The test from rev14 to rev0 fails if it doesn't call continue and append
all revisions of the same main rev to the list.
It should skip last empty generation when the key is just tombstoned.
The rev15 and rev16 in the test fails if it doesn't skip last empty generation
and find previous generations.