Was able to get 2s wait times with 500 concurrent requests on a fast machine;
a slower machine could possibly see similar delays with a single connection.
Fixes#6220
This commit lets etcdserver skip needless log entry applying. If the
result of log applying isn't required by the node (client that issued
the request isn't talking with the node) and the operation has no side
effects, applying can be skipped.
It would contribute to reduce disk I/O on followers and be useful for
a cluster that processes much serializable get.
These legacy flags support are here only because we do not want
CoreOS updates to break people.
Now people will be aware of that they switch to etcd3. Do not need
to support 0.x flags any more.
Fix https://github.com/coreos/etcd/issues/5231.
The issue shows that slow CI can take more than 200ms
for purging. This increase the loop iteration to wait
up to 300ms in case the disk is being slow.
A call file.Sync on OSX doesn't guarantee actual persistence on
physical drive media as the data can be cached in physical drive's
buffers. Hence calls to file.Sync need to be replaced with
fcntl(F_FULLFSYNC).
File lock interface was more verbose than it needed to be while
simultaneously making it difficult to support systems (e.g., Windows)
that only permit locked writes on a single fd holding the lock.
Perviously, we only use 8bits from member identification
in id generation. The conflict rate is A(256,3)/256^3, which
is around 1%. Now we use 16bites to reduce the rate to 0.005%.
We can attach the full member id into id generation if needed...
go HTTP library uses type assertion to determine if a connection
is a TLS connection. If we wrapper TLS Listener with any customized
Listener that can create customized Conn, HTTPs will be broken.
This commit fixes the issue.
Given unix://<socketname>, NewListener will listen on unix socket <socketname>.
This is useful when binding to tcp ports is undesirable (e.g., testing).
Provides two implementations of Recorder-- one that is non-blocking
like the original version and one that provides a blocking channel
to avoid busy waiting or racing in tests when no other synchronization
is available.
etcd might generate incomplete proxy config file after a power failure.
It is because we use ioutil.WriteFile. And iotuile.WriteFile does
not call Sync before closing the file.
CI is slow sometime. To make the test less flaky, we increases the
timeout. This does not affect the correctness, but the test might
take longer to finish or to fail.
rafthttp logs repeated messages when amounts of message-drop logs
happen, and it becomes log spamming.
Use MergeLogger to merge log lines in this case.
This helps the test to pass safely in semaphore CI.
Based on my manual testing, it may take at most 500ms to return
error in semaphore CI, so I set 1s as a safe value.
We use url.ParseQuery to parse names-to-urls string, but it has side
effect that unescape the string. If the initial-cluster string has ipv6
which contains `%25`, it will unescape it to `%` and make further url
parse failed.
Fix it by modifiying the parse process.
Go1.4 doesn't support literal IPv6 address w/ zone in
URI(https://github.com/golang/go/issues/6530), so we only enable tests
in Go1.5+.
After enabling v3 demo, it may change the underlying data organization
for v3 store. So we forbid to unset --experimental-v3demo once it has
been used.
"X-Content-Type-Options" was being autoadded, but none of the
test maps took it into account. I saw that "Content-Type" was
also being deleted, so I figured that was the best solution
for this as well.
It is good to print it in debug output:
```
21:56:12 etcd1 | 2015-08-25 21:56:12.162406 I | etcdmain: peerTLS: cert
= certs/etcd1.pem, key = certs/etcd1-key.pem, ca = , trusted-ca =
certs/ca.pem, client-cert-auth = true
```
This problem is totally fixed at 1.5.
go1.5 adds a Request.Cancel channel, which allows for "race free"
cancellation
(8b4278ffb7).
Our implementation relies on it to always cancel in-flight request.
This solves the problem that etcd may fatal because its critical path
cannot get file descriptor resource when the number of clients is too
big. The PR lets the client listener close client connections
immediately after they are accepted when
the file descriptor usage in the process reaches some pre-set limit, so
it ensures that the internal critical path could always get file
descriptor when it needs.
When there are tons to clients connecting to the server, the original
behavior is like this:
```
2015/08/4 16:42:08 etcdserver: cannot monitor file descriptor usage
(open /proc/self/fd: too many open files)
2015/08/4 16:42:33 etcdserver: failed to purge snap file open
default2.etcd/member/snap: too many open files
[halted]
```
Current behavior is like this:
```
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:26 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:27 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 etcdserver: 80% of the file descriptor limit is
used [used = 873, limit = 1024]
```
It is available at linux system today because pkg/runtime only has linux
support.
If TLS config is empty, etcd downgrades keepalive listener from HTTPS to
HTTP without warning. This results in HTTPS downgrade bug for client urls.
The commit returns error if it cannot listen on TLS.
Waiting 3ms is not long enough for schedule to work well. The test suite
may fail once per 200 times in travis due to this. Extend this to 10ms
to ensure schedule could work. Now it could run 1000 times successfully
in travis.
It is possible to trigger the time.After case if the timer went off
between time.After setting the timer for its channel and the time that
select looked at the channel. So it needs to be longer.
refer: https://groups.google.com/forum/#!topic/golang-nuts/1tjcV80ccq8
multiple cpu running may be slower than single cpu running, so it may
take longer time to remove files.
Increase from 5ms to 20ms to give it enough time.
ForceGosched() performs bad when GOMAXPROCS>1. When GOMAXPROCS=1, it
could promise that other goroutines run long enough
because it always yield the processor to other goroutines. But it cannot
yield processor to goroutine running on other processors. So when
GOMAXPROCS>1, the yield may finish when goroutine on the other
processor just runs for little time.
Here is a test to confirm the case:
```
package main
import (
"fmt"
"runtime"
"testing"
)
func ForceGosched() {
// possibility enough to sched up to 10 go routines.
for i := 0; i < 10000; i++ {
runtime.Gosched()
}
}
var d int
func loop(c chan struct{}) {
for {
select {
case <-c:
for i := 0; i < 1000; i++ {
fmt.Sprintf("come to time %d", i)
}
d++
}
}
}
func TestLoop(t *testing.T) {
c := make(chan struct{}, 1)
go loop(c)
c <- struct{}{}
ForceGosched()
if d != 1 {
t.Fatal("d is not incremented")
}
}
```
`go test -v -race` runs well, but `GOMAXPROCS=2 go test -v -race` fails.
Change the functionality to waiting for schedule to happen.
This reverts commit f8ce5996b0.
etcd no longer resolves TCP addresses passed in through flags,
so there is no need to compare hostname and IP slices anymore.
(for more details: a3892221ee)
Conflicts:
etcdserver/cluster.go
etcdserver/config.go
pkg/netutil/netutil.go
pkg/netutil/netutil_test.go
The PR extracts types.Cluster from etcdserver.Cluster. types.Cluster
is used for flag parsing and etcdserver config.
There is no need to expose etcdserver.Cluster public, which contains
lots of etcdserver internal details and methods. This is the first step
for it.
It exposes the metrics of file descriptor limit and file descriptor used.
Moreover, it prints out warning when more than 80% of fd limit has been used.
```
2015/04/08 01:26:19 etcdserver: 80% of the file descriptor limit is open
[open = 969, limit = 1024]
```
etcd does not provide enough flexibility to configure server SSL and
client authentication separately. When configuring server SSL the
`--ca-file` flag is required to trust self-signed SSL certificates
used to service client requests.
The `--ca-file` has the side effect of enabling client cert
authentication. This can be surprising for those looking to simply
secure communication between an etcd server and client.
Resolve this issue by introducing four new flags:
--client-cert-auth
--peer-client-cert-auth
--trusted-ca-file
--peer-trusted-ca-file
These new flags will allow etcd to support a more explicit SSL
configuration for both etcd clients and peers.
Example usage:
Start etcd with server SSL and no client cert authentication:
etcd -name etcd0 \
--advertise-client-urls https://etcd0.example.com:2379 \
--cert-file etcd0.example.com.crt \
--key-file etcd0.example.com.key \
--trusted-ca-file ca.crt
Start etcd with server SSL and enable client cert authentication:
etcd -name etcd0 \
--advertise-client-urls https://etcd0.example.com:2379 \
--cert-file etcd0.example.com.crt \
--key-file etcd0.example.com.key \
--trusted-ca-file ca.crt \
--client-cert-auth
Start etcd with server SSL and client cert authentication for both
peer and client endpoints:
etcd -name etcd0 \
--advertise-client-urls https://etcd0.example.com:2379 \
--cert-file etcd0.example.com.crt \
--key-file etcd0.example.com.key \
--trusted-ca-file ca.crt \
--client-cert-auth \
--peer-cert-file etcd0.example.com.crt \
--peer-key-file etcd0.example.com.key \
--peer-trusted-ca-file ca.crt \
--peer-client-cert-auth
This change is backwards compatible with etcd versions 2.0.0+. The
current behavior of the `--ca-file` flag is preserved.
Fixes#2499.
Support IPv6 address for ETCD_ADDR and ETCD_PEER_ADDR
pkg/flags: Support IPv6 address for ETCD_ADDR and ETCD_PEER_ADDR
pkg/flags: tests for IPv6 addr and bind-addr flags
pkg/flags: IPAddressPort.Host: do not enclose IPv6 address in square brackets
pkg/flags: set default bind address to [::] instead of 0.0.0.0
pkg/flags: we don't need fmt any more
also, one minor fix: net.JoinHostPort takes string as a port value
pkg/flags: fix ipv6 tests
pkg/flags: test both IPv4 and IPv6 addresses in TestIPAddressPortString
etcdmain: test: use [::] instead of 0.0.0.0
If the TLS config is empty, etcd downgrades https to http without a warning.
This commit avoid the downgrade and stoping etcd from bootstrap if it cannot
listen on TLS.
for transport that are using timeout connections, we set the
maxIdleConnsPerHost to -1. The default transport does not clear
the timeout for the connections it sets to be idle. So the connections
with timeout cannot be reused.
Dial timeout is set shorter because
1. etcd is supposed to work in good environment, and the new value is long
enough
2. shorter dial timeout makes dial fail faster, which is good for
performance
The functionality in pkg/osutil ensures that all interrupt handlers finish
and the process kills itself with the proper signal.
Test for interrupt handling added.
The server shutsdown gracefully by stopping on interrupt (Issue #2277.)