I set up a k8s cluster more or less following this guide. So I have three master and control-plane nodes. I use haproxy as load balancer with following config:
#/etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log /dev/log local0
log /dev/log local1 info
daemon
#---------------------------------------------------------------------
# common defaults that all the listen and backend sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 1
timeout http-request 10s
timeout queue 20s
timeout connect 5s
timeout client 20s
timeout server 20s
timeout http-keep-alive 10s
timeout check 10s
#---------------------------------------------------------------------
# apiserver frontend which proxys to the control plane nodes
#---------------------------------------------------------------------
frontend apiserver
bind *:8443
mode tcp
option tcplog
default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
option tcp-check
balance roundrobin
server k8s1 x.x.x.15:6443 check
server k8s2 x.x.x.16:6443 check
server k8s3 x.x.x.17:6443 check
as well as keepalived for managing a VIP:
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
timeout 5
fall 10
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface ens18
virtual_router_id 53
priority 101
authentication {
auth_type PASS
auth_pass 123456
}
virtual_ipaddress {
x.x.x.18
}
track_script {
check_apiserver
}
}
and the check_apiserver script:
#!/usr/bin/env bash
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
if ip addr | grep -q ${VIP}; then
curl --silent --max-time 2 --insecure https://x.x.x.18:8443/ -o /dev/null || errorExit "Error GET https://x.x.x.18:8443/"
fi
kubelet, kubeadm and kubectl are all version 1.22.2
I do create the cluster using
sudo kubeadm init --control-plane-endpoint "x.x.x.18:8443" --upload-certs --v=5 --pod-network-cidr=172.31.0.0/16
and add weave using
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=172.31.0.0/16"
With this configuration I am able to create e.g. an EMQX cluster. The problem appears whenever I stop one node. Every statefulset, that had a Pod running on the stopped node, becomes unresponsive for pretty much exactly 15 minutes.
Checking keepalived using ip a s ens18
I do see the VIP move almost instant to a running node. Using the haproxy stats dashboard the node is marked as active up going DOWN after 2s and active or backup DOWN after 4 more seconds. This seems to work as well.
Modifying kubernetes timeouts (e.g. pod eviction time) does work, so that the Pods are marked as terminating earlier but the statefulset remains unresponsive for 15 minutes no matter the eviction time.
Setting up a three node kind network with all nodes master and control-plane does not show this behaviour, which is why I am guessing it is a k8s config problem. But what am I missing?
Edit1: The cluster remains accessible in that time so that I can watch kubectl get all --all-namespaces -o wide
to check the cluster status. All I do see is the pods from the stopped node remain in terminating state.
Edit2: The only suspicious behavior was weave detecting a new MAC address after 15 min. In order to speed up error search I started kind without its own CNI and used weave instead. By this I could achieve identical logs and the exact same problem as with the "real" kubernetes cluster. Unfortunately I had no luck with weaves debug logs, therefore I switched to calico CNI and changed the podSubnet to 192.168.0.0/16. This solved the problem in kind but applying the exact same solution to my kubernetes cluster leaves me with the same problem again...