`etcd.service`: Support Graceful Reboot for AIO Node

Currently our sample systemd service file `contrib/systemd/etcd.service`
have startup/shutdown dependency as below:

    [Unit]
    After=network.target

For some rare condition, e.g. bare matel deployment with slow network
startup, IP could not be assigned e arly enough before etcd default
`ETCD_HEARTBEAT_INTERVAL="100"` and `ETCD_ELECTION_TIMEOUT="1000"` get
timeouted, after graceful system reboot.

This cause etcd false negative classify itself use unhealthy, therefore
stop rejoining the remaining online cluster members.

This PR introduce:

  - `etcd.service`: Ensure startup after `network-online.target` and
    `time-sync.target`, so effective network connectivity and synced
    time is available.

The logic is concept proof by
<https://github.com/alvistack/ansible-role-etcd/tree/develop>; also
works as expected with Ceph + Kubernetes deployment by
<https://github.com/alvistack/ansible-collection-kubernetes/tree/develop>.
No more deadlock happened during graceful system reboot, both AIO
single/multiple node with loopback mount.

Also see:

  - <https://github.com/ceph/ceph/pull/36776>
  - <https://github.com/etcd-io/etcd/pull/12259>
  - <https://github.com/cri-o/cri-o/pull/4128>
  - <https://github.com/kubernetes/release/pull/1504>

Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
release-3.5
Wong Hoi Sing Edison 2020-08-26 11:38:04 +08:00
parent 3e1a64913a
commit 17ceed9b47
No known key found for this signature in database
GPG Key ID: 1FB1DE0629A57FD1
1 changed files with 2 additions and 1 deletions

View File

@ -1,7 +1,8 @@
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
After=network-online.target local-fs.target remote-fs.target time-sync.target
Wants=network-online.target local-fs.target remote-fs.target time-sync.target
[Service]
User=etcd