We are using ipvs
for L4 loadbalancing which transmits packets to L7 backends on ipip tunnel mode.
There are three ipvs systems configured with source hashing for persistence. Sometimes, ipvs is transmitting the packets to incorrect backends.
For example, ipvs 1 recieves packet from client 1.1.1.1 and it sends the packet to backend realserver 1, the same client's next packet is received by ipvs 2 which sends it to the backend realserver 2. Now, backend 2 has no idea of this packet because the connection was actually initiated with realserver 1, thus the realserver 2 ends the connection with an RST packet.
This happens not only with a particular client, all the clients are having the same behavior.
to my understanding, all L4 ipvs should pick the same real server because of the source hashing algorithm.
I built a same setup in lab, but couldn't reproduce it. The setup that has issue is production, hence I cannot do any huge changes to it for debugging purpose.
Keepalived is used to manage the ipvs.
Any directions on how to debug this issue with minimal impact will be really helpful.
PS - I know source hashing isn't very consistent, but the packets being sent to wrong real servers are too high. We have other clusters where we have never seen this issue.
IPVS 1
Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
-> RemoteAddress:Port
TCP x.x.x.26:443 401541 3620595 0 458422K 0
-> x.x.x.1:443 401234 3618511 0 458239K 0
-> x.x.x.2:443 15 126 0 12341 0
-> x.x.x.3:443 35 213 0 20832 0
-> x.x.x.4:443 16 113 0 10980 0
-> x.x.x.5:443 19 132 0 12113 0
-> x.x.x.6:443 18 140 0 13616 0
-> x.x.x.7:443 12 97 0 9262 0
-> x.x.x.8:443 19 120 0 10448 0
-> x.x.x.9:443 164 1083 0 88618 0
-> x.x.x.15:443 9 60 0 5498 0
IPVS 2
Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
-> RemoteAddress:Port
TCP x.x.x.26:443 402903 3626029 0 459621K 0
-> x.x.x.1:443 12 56 0 4150 0
-> x.x.x.2:443 21 132 0 12967 0
-> x.x.x.3:443 168 1084 0 89908 0
-> x.x.x.4:443 14 122 0 11005 0
-> x.x.x.5:443 12 79 0 7045 0
-> x.x.x.6:443 402584 3623968 0 459444K 0
-> x.x.x.7:443 29 146 0 12899 0
-> x.x.x.8:443 22 190 0 17336 0
-> x.x.x.9:443 10 66 0 6049 0
-> x.x.x.15:443 31 186 0 15724 0
Thanks !