Keepalived won't forward traffic to BACKUP node after Kubernetes cluster setup

Question

System Structure:

10.10.1.86: Kubernetes master node
10.10.1.87: Kubernetes worker 1 node; keepalived MASTER node
10.10.1.88: Kubernetes worker 2 node; keepalived BACKUP node
10.10.1.90: VIP, would load balance to .87 & .88; implemented by keepalived.

This Kubernetes cluster is a dev env, testing collect netflow log.

What I want to achieve is:

All router / switch netflow log first output to .90
Then use keepalived to load balance (lb_kind: NAT) to .87 & .88, which are two Kubernetes workers.
There is NodePort Service to catch these traffic into Kubernetes cluster and do the rest data parsing jobs.

Something like:

        |                {OS Network}                   |   {Kubernetes Network}

                                K8s Worker -> filebeat -> logstash (deployments)
                              /
<data> -> [VIP] load balance
                              \ 
                                K8s Worker -> filebeat -> logstash (deployments)

filebeat.yml (have tested that traffics are all fine after filebeat, so I use file output to narrow root cause.)

# cat filebeat.yml
filebeat.inputs:

- type: tcp
  max_message_size: 10MiB
  host: "0.0.0.0:5100"

- type: udp
  max_message_size: 10KiB
  host: "0.0.0.0:5150"




#output.logstash:
#  hosts: ["10.10.1.87:30044", "10.10.1.88:30044"]
output.file:
  path: "/tmp/"
  filename: tmp-filebeat.out

Kubernetes

Master and Workers are 3 VMs in my private env; not any of GCP or AWS providers.
Version:

# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

Services

# cat logstash.service.yaml
apiVersion: v1
kind: Service
metadata:
  name: logstash-service
spec:
  type: NodePort
  selector:
    app: logstash
  ports:
    - port: 9514
      name: tcp-port
      targetPort: 9514
      nodePort: 30044

Once data get in Kubernetes, all works fine.
It was the VIP load balance not forwarding.

Keepalived conf

!Configuration File for keepalived
global_defs {
  router_id proxy1   # `proxy 2` at the other node
}


vrrp_instance VI_1 {
  state MASTER       # `BACKUP` at the other node
  interface ens160
  virtual_router_id 41
  priority 100       # `50` at the other node
  advert_int 1
  virtual_ipaddress {
    10.10.1.90/23
  }
}

virtual_server 10.10.1.90 5100 {
  delay_loop 30
  lb_algo rr
  lb_kind NAT
  protocol TCP
  persistence_timeout 0

  real_server 10.10.1.87 5100 {
    weight 1
  }
  real_server 10.10.1.88 5100 {
    weight 1
  }
}
virtual_server 10.10.1.90 5150 {
  delay_loop 30
  lb_algo rr
  lb_kind NAT
  protocol UDP
  persistence_timeout 0

  real_server 10.10.1.87 5150 {
    weight 1
  }
  real_server 10.10.1.88 5150 {
    weight 1
  }

It works Before Kubernetes cluster setup

Both .87 & .88 have installed keepalived, and rr (RoundRobin) load balance works fine (TCP and UDP).
Stop keepalived service (systemctl stop keepalived) when going to setup kubernetes cluster, just in case.

Problem occurred After Kubernetes cluster setup

Found that only MASTER node .87 can get traffic forwarded, the VIP can't forward to BACKUP node .88.
The forwarded data from MASTER is successfully catched by kubernetes NodePort and deployments.

Problem test by `nc`:

nc: only who holds VIP (MASTER node) can forward traffic, when rr forward to BACKUP, it just shows timeout.
also tested by nc -l 5100 on both server, only MASTER node got results.

# echo "test" | nc 10.10.1.90 5100
# echo "test" | nc 10.10.1.90 5100
Ncat: Connection timed out.
# echo "test" | nc 10.10.1.90 5100
# echo "test" | nc 10.10.1.90 5100
Ncat: Connection timed out.

Some Info

Package versions

# rpm -qa |grep keepalived
keepalived-1.3.5-19.el7.x86_64

Kubernetes CNI: Calico

# kubectl get pod -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
calico-kube-controllers-b656ddcfc-wnkcj   1/1     Running   2          78d
calico-node-vnf4d                         1/1     Running   8          78d
calico-node-xgzd5                         1/1     Running   1          78d
calico-node-zt25t                         1/1     Running   8          78d
coredns-558bd4d5db-n6hnn                  1/1     Running   2          78d
coredns-558bd4d5db-zz2rb                  1/1     Running   2          78d
etcd-a86.axv.bz                           1/1     Running   2          78d
kube-apiserver-a86.axv.bz                 1/1     Running   2          78d
kube-controller-manager-a86.axv.bz        1/1     Running   2          78d
kube-proxy-ddwsr                          1/1     Running   2          78d
kube-proxy-hs4dx                          1/1     Running   3          78d
kube-proxy-qg2nq                          1/1     Running   1          78d
kube-scheduler-a86.axv.bz                 1/1     Running   2          78d

ipvsadm (same result on .87, .88)

# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.10.1.90:5100 rr
  -> 10.10.1.87:5100              Masq    1      0          0
  -> 10.10.1.88:5100              Masq    1      0          0
UDP  10.10.1.90:5150 rr
  -> 10.10.1.87:5150              Masq    1      0          0
  -> 10.10.1.88:5150              Masq    1      0          0

Selinux is always Permissive
If stop firewalld, still not work either.
sysctl difference:

# before:
net.ipv4.conf.all.accept_redirects = 1
net.ipv4.conf.all.forwarding = 0
net.ipv4.conf.all.route_localnet = 0
net.ipv4.conf.default.forwarding = 0
net.ipv4.conf.lo.forwarding = 0
net.ipv4.ip_forward = 0

# after
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.all.route_localnet = 1
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.lo.forwarding = 1
net.ipv4.ip_forward = 1

Not sure if any further check can do now, please advise, thank you!

How did you set up your cluster? Which version of Kubernetes did you use? Did you use some cloud providor, or did you use bare metal? How did you configure networking inside your cluster? Please attach yaml files. How did you test everything on Kubernetes? — Mikołaj Głodziak, Jul 29 '21 at 13:31
Sorry about that (After `nc` test, I am sure that something goes wrong in the VIP load balancing to BACKUP node, so I did not attach Kubernetes Info.) ; `NodePort` service updated. Thank you! — Kenting, Jul 30 '21 at 13:25
No, I just updated more info about Kubernetes. Problem still unsolved; the workaround I could try is that use only VIP, instead of load balance (virtual server - real server). — Kenting, Aug 02 '21 at 03:02

Keepalived won't forward traffic to BACKUP node after Kubernetes cluster setup

System Structure:

Kubernetes

Keepalived conf

It works Before Kubernetes cluster setup

Problem occurred After Kubernetes cluster setup

Problem test by nc:

Some Info

0 Answers0

Problem test by `nc`: