vrrp routing ping, but not other traffic

Question

I've got two VMs running BusyBox (under ESX). These machines function only as load balancers.

I'm using pen to do the load balancing on each machine which is working fine. But when I fire up vrrpd ping works, but nothing else does.

Each load balancer has 3 interfaces. The Management IPs are on eth0, eth1 is for a second load balancer setup.

LBCO102A
10.3.16.96 - (eth0) Management IP
10.3.16.84 - (eth2) IP that pen uses

LBCO102B
10.3.16.94 - (eth0) Management IP
10.3.16.85 - (eth2) IP that pen uses

vrrpd uses 10.3.16.58

On LBCO102A I'm using the following to start vrrpd:

vrrpd -i eth2 -v 58 -p 100 10.3.16.58

On LBCO102B I'm using the following to start vrrpd:

vrrpd -i eth2 -v 58 -p 50 10.3.16.58

I can connect to the IPs 10.3.16.84 and 10.3.16.85 without issue on port 80. I can connect to the management IPs 10.3.16.94 and 10.3.16.96 without issue. When I connect to 10.3.16.58 it times out. Nothing is displayed in the /var/run/messages file except that one is master and the other not.

Does anyone have any ideas as to why vrrpd isn't pushing traffic other than ping? I've got three of these setups. One on pop3, one on smtp and one on http. None of them work for anything but ping.

Here's the netstat -an from LBCO102A

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 10.3.16.107:110         0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:9999            0.0.0.0:*               LISTEN
tcp        0      0 10.3.16.84:80           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:8889            0.0.0.0:*               LISTEN
tcp        0      0 10.3.16.107:110         10.3.17.30:53960        TIME_WAIT
tcp        0      0 10.3.16.96:22           10.3.30.154:1224        ESTABLISHED
tcp        0      0 10.3.16.107:110         10.3.17.30:54000        TIME_WAIT
tcp        0      0 10.3.16.107:110         10.3.17.30:54102        TIME_WAIT
tcp        0      0 10.3.16.107:110         10.3.17.30:54038        TIME_WAIT
tcp        0      0 10.3.16.107:110         10.3.17.30:53959        TIME_WAIT
tcp        0      0 10.3.16.107:110         10.3.17.30:54001        TIME_WAIT
tcp        0      0 10.3.16.107:110         10.3.17.30:54101        TIME_WAIT
tcp        0      0 10.3.16.96:22           10.3.30.154:1097        ESTABLISHED
tcp        0      0 10.3.16.107:110         10.3.17.30:54037        TIME_WAIT
raw        0      0 0.0.0.0:112             0.0.0.0:*               0
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  7      [ ]         DGRAM                    983    /tmp/log
unix  2      [ ]         DGRAM                    1156708
unix  2      [ ]         DGRAM                    1156657
unix  2      [ ]         DGRAM                    1156524
unix  2      [ ]         DGRAM                    192729
unix  2      [ ]         DGRAM                    994

Here's the netstat -an from LBCO102B

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:9999            0.0.0.0:*               LISTEN
tcp        0      0 10.3.16.85:80           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN
tcp        0      0 10.3.16.94:22           10.3.30.154:1118        ESTABLISHED
raw        0      0 0.0.0.0:112             0.0.0.0:*               0
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  6      [ ]         DGRAM                    981    /tmp/log
unix  2      [ ]         DGRAM                    188253
unix  2      [ ]         DGRAM                    179316
unix  2      [ ]         DGRAM                    179306
unix  2      [ ]         DGRAM                    988

Here's what I've got in my startup scripts. (there's more in the scripts to check for start/stop/restart) LBCO103A

echo -n "Starting eth2: "
ifconfig eth2 10.3.16.84 netmask 255.255.255.0 up
echo "OK"

echo -n "Starting vrrp-ascossrs101: "
vrrpd -i eth2 -v 58 -p 100 10.3.16.58
echo "OK"

echo -n "Starting pen-ascossrs101: "
/bin/pen -C 8888 -X -l /var/log/ascossrs101.log -p /var/log/ascossrs101.pid 10.3.16.84:80 10.3.16.56:80 10.3.16.57:80
echo "OK"

LBCO102B

echo -n "Starting eth2: "
ifconfig eth2 10.3.16.85 netmask 255.255.255.0 up
echo "OK"

echo -n "Starting vrrp-ascossrs101: "
vrrpd -i eth2 -v 58 -p 100 10.3.16.58
echo "OK"

echo -n "Starting pen-ascossrs101: "
/bin/pen -C 8888 -X -l /var/log/ascossrs101.log -p /var/log/ascossrs101.pid 10.3.16.85:80 10.3.16.56:80 10.3.16.57:80
echo "OK"

Just wanted to leave a note so that you get the alert that I updated my answer in relation to you're most recent information. — Kevin Nisbet, Oct 22 '09 at 16:56

Kevin Nisbet · Accepted Answer · 2009-10-22T12:44:59.507

I haven't used pen, so I'm not quite sure how it works, but might be able to help.

Can you run netstat -an and provide output please? My guess would be pen is bound to the eth2 IP, not all addresses on the box. It should be configured to 0.0.0.0 so that it will pick up any address that the box owns, or you're vrrpd masterscript should be run to cause pen to bind to you're VIP. What may have to happen is masterscript has to launch pen itself already configured to expect the vip to be bound to you're eth2 interface.

Edit now with netstat output: Ok so here's the problem: LBCO102A tcp 0 0 10.3.16.84:80 0.0.0.0:* LISTEN

The 10.3.16.84:80 means that you are only bound to 10.3.16.84 on port 80, thus even if the server has the VIP, it's not listening to 10.3.16.58 port 80.

Again, I'm not all that familiar with pen, but assuming the first IP address passed is the bind: /bin/pen -C 8888 -X -l /var/log/ascossrs101.log -p /var/log/ascossrs101.pid 10.3.16.84:80 10.3.16.56:80 10.3.16.57:80 could be /bin/pen -C 8888 -X -l /var/log/ascossrs101.log -p /var/log/ascossrs101.pid 0.0.0.0:80 10.3.16.56:80 10.3.16.57:80

The 0.0.0.0:80 would mean to bind to all addresses and all interfaces on the box.

The other alternative might be to use: /bin/pen -C 8888 -X -l /var/log/ascossrs101.log -p /var/log/ascossrs101.pid 10.3.16.58:80 10.3.16.56:80 10.3.16.57:80

However, I doubt this will work as is, because when the program launches, the VM won't have that IP address, so pen won't be able to bind to it explicitly. I tried to look at the documentation for vrrpd, and i'm not sure if it can do this, but freevrrp has a masterscript option, which will basically run a script when it takes over a VIP. Thus, you would simply add the command to launch pen as part of the masterscript, so when the box takes ownership of the VIP, it will launch pen and bind it to the VIP IP address.

I've got a production release tonight. Once that's done I'll get the vrrp turned back on and get you the outputs. — mrdenny, Oct 21 '09 at 02:01
I've updated my post with the netstat -an output as well as the startup scripts. The output from netstat -an was when the normal config with vrrp was inplace. — mrdenny, Oct 22 '09 at 05:24
I like the idea of freevrrp justing running a script taking over the IP and bringing pen online. I'll do a google for it tonight. Thanks. — mrdenny, Oct 22 '09 at 19:08
@Kevin it appears that I've gotten it straitened out. I posted an answer with what appears to be the correct setup. — mrdenny, Oct 23 '09 at 02:42

score 1 · Answer 2 · answered Oct 23 '09 at 02:41

ok, I think I've got everything working now. It was a combination of stuff that I got from @Kevin plus some stuff that I found on the net (there's very little out there about vrrpd).

It appears that vrrpd fires tells the server to listen to the IP, so you don't want to start the interface with the IP. The interface does need to have an IP, just not the IP that vrrp will be using.

My next problem was that I was had pen working on a different IP than the one that vrrpd was working with. I ended up having pen listed for all connections on port 80 on all IPs. If you don't do this then if pen starts up and the machine isn't the master load balancer then pen will die because the IP that it's looking for isn't there.

I had it setup with pen listening on another IP address, and assumed that vrrpd was another load balancer infront of that. Turns out that vrrpd just starts up the IP and closes it.

So to recap what I've ended up with it this.

LBCO102A
10.3.16.96 - eth0 (Management IP)
10.3.16.148 - eth1

LBCO102B
10.3.16.94 - eth0 (Management IP)
10.3.16.149 - eth1

Then the vrrpd on both machines is configured with the 10.3.16.58 address, and pen on both machines is configured with 0.0.0.0.

There's still more testing to do, but I've been running pen in debug mode and keeping an eye on the /var/run/messages file and if I restart vrrpd on the active node it becomes passive and the new active node starts showing traffic. Time will tell, but it looks promising.

score 0 · Answer 3 · answered Oct 20 '09 at 08:05

0

Just a guess here. You may be confusing it with having two interfaces on the same subnet. You could try creating a seperate subnet for the management network.

answered Oct 20 '09 at 08:05

Roy

4,256
4
35
50

That's a thought. I'll give that a try. I've got another subnet that I can move the eth0 to which is the management subnet where the SAN and whatnot live. – mrdenny Oct 21 '09 at 02:02

vrrp routing ping, but not other traffic

3 Answers3