Own etcd cluster for Kubernetes

Question

I want to build my own Kubernetes cluster across two locations (300 km distance) and integrate it into GitLab.

Let me list my ideas. My question is if I have a mistake in my thinking somewhere and ask to solve it.

Since I can only set up VMs and have no rights directly on the hosts, I want to install an etcd-cluster on 5 VMs (3+2). I would install etcd with apt on Ubuntu 18.04. For this I don't need Kubernetes at first.
An odd number of instances only applies to etcd and not to control planes?
Does it make any sense to set up separate VMs for the control planes or can I reuse the 3+2 VMs of the etcd cluster? Otherwise I would already have 10 VMs.

c4f4t0r · Accepted Answer · 2020-02-04T22:08:53.470

Unless you have a big kubernetes cluster with thousand of services and many nodes, you can setup a separated etcd cluster and remember if you want to setup the etcd cluster in two locations, check the coreos documentation, because etcd is very sensitive to latency

If you choose to use an external etcd cluster, you don't need an odd number for the control planes, you only need an odd number with etcd, because etcd machines are in cluster.

Control planes doesn't communicate between them, only with etcd

uav · Answer 2 · 2020-02-13T14:41:17.840

I have learned a few things and would like to share them with you.

etcd is pronounced like "@ cee dee".

I have decided now to not use apt (etcd 3.2 on Ubuntu 18.04) for installation but to download the latest version (3.3.18) with wget. The first hit on Google worked.

Installation

cd /opt/
sudo wget https://github.com/etcd-io/etcd/releases/download/v3.3.18/etcd-v3.3.18-linux-amd64.tar.gz
sudo tar xvf etcd-v3.3.18-linux-amd64.tar.gz
cd etcd-v3.3.18-linux-amd64/
sudo mv etcd etcdctl /usr/local/bin/
sudo mkdir -p /var/lib/etcd/
sudo mkdir /etc/etcd/
sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
sudo chown -R etcd:etcd /var/lib/etcd/

Reset

Remove all data in the member folders:

sudo rm /etc/etcd/*.etcd/member/ /opt/etcd-v*-linux-amd64/default.etcd/member/ /var/lib/etcd/member/ -fr

Or change argument --initial-cluster-token XYZ (every five members the same) and start etcd with parameter --force-new-cluster.

When you restart your existing cluster or add more members: change from --initial-cluster-state new to --initial-cluster-state existing.

Drop all data (keys with values) from etcd cluster:

sudo ETCDCTL_API=3 etcdctl del "" --prefix

Configuration

sudo -u etcd etcd \
--name aaa \
--data-dir /var/lib/etcd/ \
--listen-peer-urls http://localhost:2380,http://localhost:7001,http://192.168.4.101:2380,http://192.168.4.101:7001 \
--listen-client-urls http://localhost:2379,http://localhost:4001,http://192.168.4.101:2379,http://192.168.4.101:4001 \
--initial-advertise-peer-urls http://192.168.4.101:2380 \
--initial-cluster aaa=http://192.168.4.101:2380,bbb=http://192.168.4.102:2380,ccc=http://192.168.4.103:2380,eee=http://192.168.4.105:2380,ddd=http://192.168.4.104:2380 \
--initial-cluster-state new \
--initial-cluster-token 2020-02-07T14:53 \
--advertise-client-urls http://192.168.4.101:2379

sudo -u etcd etcd \
--name bbb \
--data-dir /var/lib/etcd/ \
--listen-peer-urls http://localhost:2380,http://localhost:7001,http://192.168.4.102:2380,http://192.168.4.102:7001 \
--listen-client-urls http://localhost:2379,http://localhost:4001,http://192.168.4.102:2379,http://192.168.4.102:4001 \
--initial-advertise-peer-urls http://192.168.4.102:2380 \
--initial-cluster aaa=http://192.168.4.101:2380,bbb=http://192.168.4.102:2380,ccc=http://192.168.4.103:2380,eee=http://192.168.4.105:2380,ddd=http://192.168.4.104:2380 \
--initial-cluster-state new \
--initial-cluster-token 2020-02-07T14:53 \
--advertise-client-urls http://192.168.4.102:2379

sudo -u etcd etcd \
--name ccc \
--data-dir /var/lib/etcd/ \
--listen-peer-urls http://localhost:2380,http://localhost:7001,http://192.168.4.103:2380,http://192.168.4.103:7001 \
--listen-client-urls http://localhost:2379,http://localhost:4001,http://192.168.4.103:2379,http://192.168.4.103:4001 \
--initial-advertise-peer-urls http://192.168.4.103:2380 \
--initial-cluster aaa=http://192.168.4.101:2380,bbb=http://192.168.4.102:2380,ccc=http://192.168.4.103:2380,eee=http://192.168.4.105:2380,ddd=http://192.168.4.104:2380 \
--initial-cluster-state new \
--initial-cluster-token 2020-02-07T14:53 \
--advertise-client-urls http://192.168.4.103:2379

sudo -u etcd etcd \
--name ddd \
--data-dir /var/lib/etcd/ \
--listen-peer-urls http://localhost:2380,http://localhost:7001,http://192.168.4.104:2380,http://192.168.4.104:7001 \
--listen-client-urls http://localhost:2379,http://localhost:4001,http://192.168.4.104:2379,http://192.168.4.104:4001 \
--initial-advertise-peer-urls http://192.168.4.104:2380 \
--initial-cluster aaa=http://192.168.4.101:2380,bbb=http://192.168.4.102:2380,ccc=http://192.168.4.103:2380,eee=http://192.168.4.105:2380,ddd=http://192.168.4.104:2380 \
--initial-cluster-state new \
--initial-cluster-token 2020-02-07T14:53 \
--advertise-client-urls http://192.168.4.104:2379

sudo -u etcd etcd \
--name eee \
--data-dir /var/lib/etcd/ \
--listen-peer-urls http://localhost:2380,http://localhost:7001,http://192.168.4.105:2380,http://192.168.4.105:7001 \
--listen-client-urls http://localhost:2379,http://localhost:4001,http://192.168.4.105:2379,http://192.168.4.105:4001 \
--initial-advertise-peer-urls http://192.168.4.105:2380 \
--initial-cluster aaa=http://192.168.4.101:2380,bbb=http://192.168.4.102:2380,ccc=http://192.168.4.103:2380,eee=http://192.168.4.105:2380,ddd=http://192.168.4.104:2380 \
--initial-cluster-state new \
--initial-cluster-token 2020-02-07T14:53 \
--advertise-client-urls http://192.168.4.105:2379

You can put this into service:

/etc/systemd/system/etcd.service (create if not exist) (example for first member aaa)

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
User=etcd
Type=notify
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=%m
ExecStart=/usr/local/bin/etcd \
--name aaa \
--data-dir /var/lib/etcd/ \
--listen-peer-urls http://localhost:2380,http://localhost:7001,http://192.168.4.101:2380,http://192.168.4.101:7001 \
--listen-client-urls http://localhost:2379,http://localhost:4001,http://192.168.4.101:2379,http://192.168.4.101:4001 \
--initial-advertise-peer-urls http://192.168.4.101:2380 \
--initial-cluster aaa=http://192.168.4.101:2380,bbb=http://192.168.4.102:2380,ccc=http://192.168.4.103:2380,eee=http://192.168.4.105:2380,ddd=http://192.168.4.104:2380 \
--initial-cluster-state new \
--initial-cluster-token 2020-02-07T14:53 \
--advertise-client-urls http://192.168.4.101:2379
Restart=always
RestartSec=10s
LimitNOFILE=40000

sudo systemctl daemon-reload
# sudo systemctl enable etcd  # for auto start after reboot
sudo systemctl restart etcd

If someone can make an example with encryption, i.e. with client certificates, I would be grateful.

Chrony

It is also important that all five machines have the same time. Otherwise you will see a lot of errors in your logs. For this I used chrony.

sudo timedatectl set-timezone Europe/Berlin
sudo timedatectl set-local-rtc 1 --adjust-system-clock
sudo timedatectl set-local-rtc 0
sudo systemctl stop systemd-timesyncd.service && sudo systemctl disable systemd-timesyncd.service
sudo apt update && sudo apt --yes install chrony

/etc/chrony/chrony.conf

# I use HTTP connect proxy and can't connect to external ntp servers:
local
bindcmdaddress 0.0.0.0
allow 192.168.0.0/16
cmdallow 192.168.0.0/16
# server 192.168.4.101 prefer iburst  # himself
server 192.168.4.102 prefer iburst
server 192.168.4.103 prefer iburst
server 192.168.4.104  prefer iburst
server 192.168.4.105  prefer iburst
# ...
makestep 1 -1

# Show time etc.:
sudo timedatectl
# Show ntp network members:
sudo chronyc sources

Please remember to supply the worker nodes with the same time as well.

Checks

sudo etcdctl cluster-health

member eee6e5e8935fd1c9 is healthy: got healthy result from http://192.168.4.105:2379
member bbb7b0aca4c13cdc is healthy: got healthy result from http://192.168.4.102:2379
member aaac5ad73f7d224f is healthy: got healthy result from http://192.168.4.101:2379
member ccc20379b7c3a64e is healthy: got healthy result from http://192.168.4.103:2379
member ddd76f34bf32390e is healthy: got healthy result from http://192.168.4.104:2379
cluster is healthy

sudo etcdctl member list

eee6e5e8935fd1c9: name=eee peerURLs=http://192.168.4.105:2380 clientURLs=http://192.168.4.105:2379 isLeader=false
bbb7b0aca4c13cdc: name=bbb peerURLs=http://192.168.4.102:2380 clientURLs=http://192.168.4.102:2379 isLeader=false
aaac5ad73f7d224f: name=aaa peerURLs=http://192.168.4.101:2380 clientURLs=http://192.168.4.101:2379 isLeader=true
ccc20379b7c3a64e: name=ccc peerURLs=http://192.168.4.103:2380 clientURLs=http://192.168.4.103:2379 isLeader=false
ddd76f34bf32390e: name=ddd peerURLs=http://192.168.4.104:2380 clientURLs=http://192.168.4.104:2379 isLeader=false

Use Rancher. It is much more simpler than doing everything by hand. — uav, Sep 10 '20 at 16:44
Also: don't use Kubernetes over two locations: when the bigger N/2 + 1 location fails, your whole cluster is broken. Either use one Kubernetes cluster per location or one Kubernetes cluster on three locations. — uav, Apr 21 '21 at 09:02