3

Trying to set up HA bastion servers. Failover, load balancing is not needed. Two servers running debian. bastion01 and bastion02. 192.168.0.10 and 192.168.0.11. Floating IP is 192.168.0.12.

I started out with these configs:

bastion01:

global_defs {
   notification_email {
    dev@null.com
   }   
   notification_email_from lb1@mydomain.com
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 101 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   
}

bastion02:

global_defs {
   notification_email {
     dev@null.com 
   }   
   notification_email_from lb2@mydomain.com
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 100 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   
}

This works absolutely great. Confirmed that the floating IP will fail over when either server is shutdown.

However, it doesn't handle the case when ssh is stopped, but the server itself is still running.

For that, I'll need to add a TCP check.

It appears that keepalived's docs provide an example:

http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html

However, their example involves loadbalancing, which just adds another layer of complexity I am not interested in.

It looks like the block in question is:

TCP_CHECK { connect_timeout 3 connect_port 22 }

I tried to use my best guess as to how to configure this:

bastion01:

global_defs {
   notification_email {
     dev@null.com 
   }   
   notification_email_from lb1@mydomain.com
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 101 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   
}

real_server 192.168.0.10 22 {
    weight 1
    TCP_CHECK {
        connect_timeout 3
        connect_port 22
    }   
} 

real_server 192.168.0.11 22 {
    weight 1
    TCP_CHECK {
        connect_timeout 3
        connect_port 22
    }
}

bastion02:

global_defs {
   notification_email {
     dev@null.com 
   }   
   notification_email_from lb2@mydomain.com
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 100 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   
}

real_server 192.168.0.10 22 {
    weight 1
    TCP_CHECK {
        connect_timeout 3
        connect_port 22
    }   
} 

real_server 192.168.0.11 22 {
    weight 1
    TCP_CHECK {
        connect_timeout 3
        connect_port 22
    }
}

But this didn't work, it didn't understand the real_server blocks. Ok fine, maybe I can't get away with failover only, maybe the tcp check is part of the lb component of keepalived, so I must use load balancing here. This is fine, couldn't hurt. So...configs now become (taken directly from http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html ):

bastion01:

global_defs {
   notification_email {
    dev@null.com
   }   
   notification_email_from lb1@mydomain.com
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 101 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   

}

virtual_server 192.168.1.11 22 {
    delay_loop 6
    lb_algo rr
    lb_kind NAT 
    nat_mask 255.255.255.0

    protocol TCP 

    real_server 192.168.0.10 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }   

    real_server 192.168.0.11 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }   
} 

bastion02:

global_defs {
   notification_email {
    dev@null.com
   }   
   notification_email_from lb2@mydomain.com
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 100 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   

}

virtual_server 192.168.1.11 22 {
    delay_loop 6
    lb_algo rr
    lb_kind NAT 
    nat_mask 255.255.255.0

    protocol TCP 

    real_server 192.168.0.10 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }   

    real_server 192.168.0.11 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }   
} 

This just straight up does not work.

When I stop ssh on bastion01 and try to ssh to the floating ip, I get connection refused, the ip doesn't fail over to bastion02.

In the logs on bastion01:

bastion01 Keepalived_healthcheckers[11613]: Check on service [192.168.0.10]:22 failed after 1 retry.
bastion01 Keepalived_healthcheckers[11613]: Removing service [192.168.0.10]:22 from VS [192.168.1.11]:22

How do I convince keepalived to actually failover the floating ip when the TCP health check fails?

cat pants
  • 2,139
  • 10
  • 33
  • 44

1 Answers1

3

If you do not need load balancing, track scripts offer failover based on checks run against your service.

First, add a vrrp_script block before your vrrp_instance:

global_defs {
    enable_script_security
}

vrrp_script chk_sshd {
    script "/usr/bin/pgrep sshd" # or "nc -zv localhost 22"
    interval 5                   # default: 1s
}

Next, add a track_script to your vrrp_instance referencing the vrrp_script:

 vrrp_instance VI_1 {
    ... other stuff ...

    track_script {
        chk_sshd
    }
}

While not strictly required, the enable_script_security and FQDN of the executable provide some assurances against malicious activity and will squelch warnings in logs. See the Keepalived man page for more info.

talarczykco
  • 131
  • 5