6

I cannot manage to run keepalived correctly on xen domU.

I am following this link for configuration, and it works great on some local VM (running with KVM). If I set up the exact same configuration, but on xen domU, it does not work: both servers do not see each other and decide to be master (10.10.0.200 being the virtual IP)

$ sudo ip addr sh eth0 # host1
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:16:3e:73:b0:78 brd ff:ff:ff:ff:ff:ff
inet 10.10.0.100/24 brd 10.10.0.255 scope global eth0
inet 10.10.0.200/32 scope global eth0
inet6 fe80::216:3eff:fe73:b078/64 scope link 
   valid_lft forever preferred_lft forever

$ sudo ip addr sh eth0 # host2
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:16:3e:ee:5e:fd brd ff:ff:ff:ff:ff:ff
inet 10.10.0.101/24 brd 10.10.0.255 scope global eth0
inet 10.10.0.200/32 scope global eth0
inet6 fe80::216:3eff:feee:5efd/64 scope link 
   valid_lft forever preferred_lft forever

Is there a way I could debug this - it seems some people are able to use keepalived on xen following some mailing list, but without much info on their config.

the domain 0 has two "real" ethernet cards, eth0 and eth1, with eth0 connected to the network:

  • eth0 is listening to 192.168.3.9
  • eth1 is listening to 10.10.0.1

My xend config is :

(xend-relocation-server no)
(network-script 'network-nat netdev=eth1')
(vif-script     vif-nat)
(dom0-min-mem 1024)
(enable-dom0-ballooning no)
(total_available_memory 0) 
(dom0-cpus 0)
(vncpasswd '')

And the relevant section in /etc/hosts in xend is:

10.10.0.100    test1 test1
10.10.0.101    test2 test2

Each domU (test1 and test2) are configured to 10.10.0.100 and 10.10.0.101 respectively. Each can ping to each other through those names (configured manually through /etc/hosts for now). The virtual IP is 10.10.0.200

Note that for now, I don't care so much about the networking configuration in the dom0 (bridge vs ...), I would like to make keepalived work between the domU at all as a first step

The current ip tables on the dom0:

# Generated by iptables-save v1.4.8 on Tue Apr 19 12:52:04 2011
*filter
:INPUT ACCEPT [37536:5302365]
:FORWARD ACCEPT [5367:1221790]
:OUTPUT ACCEPT [30601:3514407]
-A FORWARD -m state --state RELATED,ESTABLISHED -m physdev --physdev-out vif8.0 -j ACCEPT 
-A FORWARD -p udp -m physdev --physdev-in vif8.0 -m udp --sport 68 --dport 67 -j ACCEPT 
-A FORWARD -m state --state RELATED,ESTABLISHED -m physdev --physdev-out vif8.0 -j ACCEPT 
-A FORWARD -s 10.10.0.101/32 -m physdev --physdev-in vif8.0 -j ACCEPT 
COMMIT
# Completed on Tue Apr 19 12:52:04 2011
# Generated by iptables-save v1.4.8 on Tue Apr 19 12:52:04 2011
*nat
:PREROUTING ACCEPT [1441667:468129452]
:POSTROUTING ACCEPT [608454:36641119]
:OUTPUT ACCEPT [608448:36640127]
-A POSTROUTING -o eth1 -j MASQUERADE 
-A POSTROUTING -o eth1 -j MASQUERADE 
-A POSTROUTING -o eth1 -j MASQUERADE 
-A POSTROUTING -s 10.10.0.0/24 -o eth0 -j SNAT --to-source 192.168.3.9 
COMMIT
# Completed on Tue Apr 19 12:52:04 2011

As for the keep alive config:

# test1 config
vrrp_script chk_haproxy {               # Requires keepalived-1.1.13
    script "killall -0 haproxy"     # cheaper than pidof
    interval 2                      # check every 2 seconds
    weight 2                        # add 2 points of prio if OK
}

vrrp_instance VI_1 {
    interface eth0
    state MASTER
    virtual_router_id 51
    priority 101                    # 101 on master, 100 on backup
    virtual_ipaddress {
        10.10.0.200
    }
    track_script {
        chk_haproxy
    }
}

and for test2:

vrrp_script chk_haproxy {               # Requires keepalived-1.1.13
    script "killall -0 haproxy"     # cheaper than pidof
    interval 2                      # check every 2 seconds
    weight 2                        # add 2 points of prio if OK
}

vrrp_instance VI_1 {
    interface eth0
    state MASTER
    virtual_router_id 51
    priority 100                    # 101 on master, 100 on backup
    virtual_ipaddress {
        10.10.0.200
    }
    track_script {
        chk_haproxy
    }
}

Each host can "arping" each other:

# on test1
sudo arping test2
ARPING 10.10.0.101 from 10.10.0.100 eth0
Unicast reply from 10.10.0.101 [FE:FF:FF:FF:FF:FF]  751.879ms
Unicast reply from 10.10.0.101 [FE:FF:FF:FF:FF:FF]  0.626ms
...

# on test2
sudo arping test1
ARPING 10.10.0.100 from 10.10.0.101 eth0
Unicast reply from 10.10.0.100 [FE:FF:FF:FF:FF:FF]  105.399ms
Unicast reply from 10.10.0.100 [FE:FF:FF:FF:FF:FF]  0.655ms

[EDIT] If I remove the track_script line from keepalived config, and restart, I get the following log:

Apr 19 13:35:06 test1 Keepalived: Terminating on signal
Apr 19 13:35:06 test1 Keepalived: Stopping Keepalived v1.1.20 (08/18,2010)
Apr 19 13:35:06 test1 Keepalived_vrrp: Terminating VRRP child process on signal
Apr 19 13:35:06 test1 Keepalived_healthcheckers: Terminating Healthchecker child process on signal
Apr 19 13:35:07 test1 Keepalived: Starting Keepalived v1.1.20 (08/18,2010)
Apr 19 13:35:07 test1 Keepalived: Starting Healthcheck child process, pid=4848
Apr 19 13:35:07 test1 Keepalived: Starting VRRP child process, pid=4849
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Initializing ipvs 2.6
Apr 19 13:35:07 test1 Keepalived_vrrp: Registering Kernel netlink reflector
Apr 19 13:35:07 test1 Keepalived_vrrp: Registering Kernel netlink command channel
Apr 19 13:35:07 test1 Keepalived_vrrp: Registering gratutious ARP shared channel
Apr 19 13:35:07 test1 Keepalived_vrrp: Initializing ipvs 2.6
Apr 19 13:35:07 test1 Keepalived_healthcheckers: IPVS: Can't initialize ipvs: Protocol not available
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Registering Kernel netlink reflector
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Registering Kernel netlink command channel
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Opening file '/etc/keepalived/keepalived.conf'.
Apr 19 13:35:07 test1 Keepalived_vrrp: IPVS: Can't initialize ipvs: Protocol not available
Apr 19 13:35:07 test1 Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Configuration is using : 3103 Bytes
Apr 19 13:35:07 test1 Keepalived_healthcheckers: Using LinkWatch kernel netlink reflector...
Apr 19 13:35:07 test1 Keepalived_vrrp: Configuration is using : 31958 Bytes
Apr 19 13:35:07 test1 Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Apr 19 13:35:08 test1 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Apr 19 13:35:09 test1 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE

and:

Apr 19 13:34:43 test2 Keepalived: Terminating on signal
Apr 19 13:34:43 test2 Keepalived: Stopping Keepalived v1.1.20 (08/18,2010)
Apr 19 13:34:43 test2 Keepalived_vrrp: Terminating VRRP child process on signal
Apr 19 13:34:43 test2 Keepalived_healthcheckers: Terminating Healthchecker child process on signal
Apr 19 13:34:44 test2 Keepalived: Starting Keepalived v1.1.20 (08/18,2010)
Apr 19 13:34:44 test2 Keepalived: Starting Healthcheck child process, pid=3811
Apr 19 13:34:44 test2 Keepalived: Starting VRRP child process, pid=3812
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Initializing ipvs 2.6
Apr 19 13:34:44 test2 Keepalived_vrrp: Registering Kernel netlink reflector
Apr 19 13:34:44 test2 Keepalived_vrrp: Registering Kernel netlink command channel
Apr 19 13:34:44 test2 Keepalived_vrrp: Registering gratutious ARP shared channel
Apr 19 13:34:44 test2 Keepalived_vrrp: Initializing ipvs 2.6
Apr 19 13:34:44 test2 Keepalived_healthcheckers: IPVS: Can't initialize ipvs: Protocol not available
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Registering Kernel netlink reflector
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Registering Kernel netlink command channel
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Opening file '/etc/keepalived/keepalived.conf'.
Apr 19 13:34:44 test2 Keepalived_vrrp: IPVS: Can't initialize ipvs: Protocol not available
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Configuration is using : 3103 Bytes
Apr 19 13:34:44 test2 Keepalived_healthcheckers: Using LinkWatch kernel netlink reflector...
Apr 19 13:34:44 test2 Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Apr 19 13:34:44 test2 Keepalived_vrrp: Configuration is using : 31958 Bytes
Apr 19 13:34:44 test2 Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Apr 19 13:34:45 test2 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Apr 19 13:34:46 test2 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
David Cournapeau
  • 243
  • 4
  • 13
  • how are you xen domU configured for network ? do you use bridge ? – silviud Jan 07 '11 at 14:23
  • I believe they are configured for bridging (I can access each domU from another machine, at least). I am not super familiar with xen, where can I find this information for sure on dom0 and domU ? – David Cournapeau Jan 08 '11 at 04:59
  • Can you please post the content of the hosts' configuration files (on the dom0)? Can you reach one domU from the other and vice-versa? – Eduardo Ivanec Apr 19 '11 at 01:49
  • @eduardo: I have added the relevant xend + hosts you asked. Let me know if you need more info. – David Cournapeau Apr 19 '11 at 02:34
  • I see you're using NAT on your Xen configuration instead of the more common bridging setup. Do you have firewall rules running in the dom0 or either domU? Can you also post your `/etc/keepalived/keepalived.conf` and keepalived's logfiles? – Eduardo Ivanec Apr 19 '11 at 03:58
  • I added the iptables configured on dom0. I am using NAT to mimic what we have on our production server, but if bridging is easier (if only as a first step), I can use that first. I don't know exactly why we use NAT on our production server, but if that prevents getting virtual IP, reconsidering that choice may be an option. – David Cournapeau Apr 19 '11 at 04:27
  • When you ping 10.120.100.105 or 10.120.100.104 instead of the IPs on 10.10.0.0/24 does it work? If not: which IPs did you use for keepalived, the 10.10.0.0/24 ones or the 10.120.100.105/24 ones? – Eduardo Ivanec Apr 19 '11 at 04:28
  • @ Eduardo: I made a mistake when updating my post with the bounty: I mixed old and new configuration. I think everything is consistent now(and corresponds to what I have now). Each host can ping the other one from both ip and their name as configured in /etc/hosts.Please forget about 10.120.100.104/5, this was a previous configuration I have used. – David Cournapeau Apr 19 '11 at 04:38
  • OK, forgotten : ) What happens if you remove the `track_script { chk_haproxy }` block? Can you please post the content of `/var/log/messages` when restarting keepalived? – Eduardo Ivanec Apr 19 '11 at 05:03
  • I added the log and removed the haproxy parts, but now that I relook at the logs, it may be that my domU kernels don't support the necessary bits for the keepalive protocol ("IPVS: Can't initialize ipvs: Protocol not available") – David Cournapeau Apr 19 '11 at 05:30
  • Enabling ip_vs removes the error in the log but does not change anything as far as keepalived behavior is concerned – David Cournapeau Apr 19 '11 at 06:02
  • I am a little confused why the arping MAC is FE:FF... which doesn't correspond to any of the interfaces shown. Can you actually ssh between the nodes? – polynomial Aug 24 '11 at 06:18

2 Answers2

1

The 'state MASTER' will confuse matters as they will both initially transition to MASTER and assume the IP (as per your logs) - you only want MASTER on one of them and BACKUP on the other (so one starts off passive).

However, since they both presumably stay as MASTER it would suggest they can't see each other's VRRP announcements (if they could one would step down after seeing a higher priority announced).

Check you can see multicast traffic from both hosts (tcpdump multicast).

Edit: crap, just realised this is quite old - might be useful for anyone else using keepalived though.

Andy Coates
  • 91
  • 1
  • 6
0

You have them both set as "state MASTER" this can cause VRRP announcement confusion even with priority being different. Try setting test2 to "state BACKUP". This has fixed it for me in the past.

This is also making me think something is going on.

    Apr 19 13:34:44 test2 Keepalived_healthcheckers: IPVS: Can't initialize ipvs: Protocol not available

I would check lsmod | grep ip and ensure that you have the kernel modules loaded for ipvs.

Hope this helps.