Trying to set up HA bastion servers. Failover, load balancing is not needed. Two servers running debian. bastion01 and bastion02. 192.168.0.10 and 192.168.0.11. Floating IP is 192.168.0.12.
I started out with these configs:
bastion01:
global_defs {
notification_email {
dev@null.com
}
notification_email_from lb1@mydomain.com
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
bastion02:
global_defs {
notification_email {
dev@null.com
}
notification_email_from lb2@mydomain.com
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
This works absolutely great. Confirmed that the floating IP will fail over when either server is shutdown.
However, it doesn't handle the case when ssh is stopped, but the server itself is still running.
For that, I'll need to add a TCP check.
It appears that keepalived's docs provide an example:
http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html
However, their example involves loadbalancing, which just adds another layer of complexity I am not interested in.
It looks like the block in question is:
TCP_CHECK { connect_timeout 3 connect_port 22 }
I tried to use my best guess as to how to configure this:
bastion01:
global_defs {
notification_email {
dev@null.com
}
notification_email_from lb1@mydomain.com
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
real_server 192.168.0.10 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
real_server 192.168.0.11 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
bastion02:
global_defs {
notification_email {
dev@null.com
}
notification_email_from lb2@mydomain.com
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
real_server 192.168.0.10 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
real_server 192.168.0.11 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
But this didn't work, it didn't understand the real_server blocks. Ok fine, maybe I can't get away with failover only, maybe the tcp check is part of the lb component of keepalived, so I must use load balancing here. This is fine, couldn't hurt. So...configs now become (taken directly from http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html ):
bastion01:
global_defs {
notification_email {
dev@null.com
}
notification_email_from lb1@mydomain.com
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
virtual_server 192.168.1.11 22 {
delay_loop 6
lb_algo rr
lb_kind NAT
nat_mask 255.255.255.0
protocol TCP
real_server 192.168.0.10 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
real_server 192.168.0.11 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
}
bastion02:
global_defs {
notification_email {
dev@null.com
}
notification_email_from lb2@mydomain.com
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
virtual_server 192.168.1.11 22 {
delay_loop 6
lb_algo rr
lb_kind NAT
nat_mask 255.255.255.0
protocol TCP
real_server 192.168.0.10 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
real_server 192.168.0.11 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
}
This just straight up does not work.
When I stop ssh on bastion01 and try to ssh to the floating ip, I get connection refused, the ip doesn't fail over to bastion02.
In the logs on bastion01:
bastion01 Keepalived_healthcheckers[11613]: Check on service [192.168.0.10]:22 failed after 1 retry.
bastion01 Keepalived_healthcheckers[11613]: Removing service [192.168.0.10]:22 from VS [192.168.1.11]:22
How do I convince keepalived to actually failover the floating ip when the TCP health check fails?