5

For some reason, ipvsadm does not seem to be equally balancing the connections between my real servers when using the wlc or lc schedulers. One real server gets absolutely hammered with requests while the others receive relatively few connections.

My ldirectord.cf file looks like this:

quiescent     = yes
autoreload    = yes
checktimeout  = 10
checkinterval = 10

# *.example.com http
virtual = 192.0.2.111:http
    real = 10.10.10.1:http  ipip    10
    real = 10.10.10.2:http  ipip    10
    real = 10.10.10.3:http  ipip    10
    real = 10.10.10.4:http  ipip    10
    real = 10.10.10.5:http  ipip    10
    scheduler = lc
    protocol = tcp
    service = http
    checktype = negotiate
    request = "/lb"
    receive = "Up and running"
    virtualhost = "site.com"
    fallback = 127.0.0.1:http

The weird thing that I think may be causing the problem (but I'm really not sure) is that ipvsadm doesn't seem to be tracking active connections properly, they all appear as inactive connections

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn    
TCP  192.0.2.111:http lc
  -> 10.10.10.1:http              Tunnel  10     0          10        
  -> 10.10.10.2:http              Tunnel  10     0          18        
  -> 10.10.10.3:http              Tunnel  10     0          3         
  -> 10.10.10.4:http              Tunnel  10     0          10        
  -> 10.10.10.5:http              Tunnel  10     0          5

If I do ipvsadm -Lnc then I see lots of connections but only ever in ESTABLISHED & FIN_WAIT states.

I was using ldirectord previously on a Gentoo based load balancer and the activeconn used to be accurate, since moving to Ubuntu 10.4 LTS something seems to be different.

# ipvsadm -v
ipvsadm v1.25 2008/5/15 (compiled with popt and IPVS v1.2.1)

So, is ipvsadm not tracking active connections properly and thus making load balancing work incorrectly and if so, how do I get it to work properly again?

Edit: It gets weirder, if I cat /proc/net/ip_vs then it looks like the correct activeconns are there:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C000026F:0050 rr 
  -> 0AB42453:0050      Tunnel  10     1          24        
  -> 0AB4321D:0050      Tunnel  10     0          23        
  -> 0AB426B2:0050      Tunnel  10     2          25        
  -> 0AB4244C:0050      Tunnel  10     2          22        
  -> 0AB42024:0050      Tunnel  10     2          23
womble
  • 95,029
  • 29
  • 173
  • 228
davidsmalley
  • 457
  • 1
  • 6
  • 14

4 Answers4

1

With lc (least connection) if all servers have the same number of connections then it will always give a new connection to the first server in the list. This can mean that if you have very low utilization, and only a connection every now and then, that connection will always go to the first host in the list.

phemmer
  • 5,789
  • 2
  • 26
  • 35
1

My favorite is wrr (weigthed round robin). Am I right in assuming that you are using the DR-approach (direct routing)?

In that case ipvsadm does not see the connection as such, since the answer from the RS (real server) will go directly to the client - not back through the LB.

Nils
  • 7,657
  • 3
  • 31
  • 71
  • With direct routing you can also run into situations where you can connect from the load balancer to the real server but because of a routing or arp filtering misconfig you can't connect to some of the real servers when using the VIP from outside the network. Real servers could be listed as "up" and receiving traffic but not able to handle it such that the real servers which _are_ working are getting more work than intended. – mtinberg Jun 24 '11 at 22:49
0

Judging by your number of connections it is probably not an issue for you but you might get uneven distribution with least connections if one of the real servers is slower responding than the others, it will then get handed fewer new connections per time than the others as it stacks up on connections more quickly.

ZaphodB
  • 653
  • 3
  • 9
0

David's command outputs indicate that he is using Tunnel mode (IPIP), which is generally setup as a variant of DR. We would need to see some routing tables or diagrams to understand his setup better.

But I agree that probably the connection tracking in LVS is confused since it doesn't see the TCP FIN packets.

ipvsadm has some settings to timeout the expired connections quicker. For example, the following command will timeout the inactive connections after 1 hour:

/sbin/ipvsadm --set 3600 120 300

The source of the clients should be double checked. The default behavior for LVS is to do persistent connections by client IP. So if stress testing with wget or ab from the same test client IP, all the connections will be sent to the same realserver.

Haproxy is a more intelligent load balancer, but needs to sit in the return path of the packets in order to work totally transparently.

Wim Kerkhoff
  • 901
  • 1
  • 5
  • 12
  • "The default behavior for LVS is to do persistent connections by client IP" -- no it isn't, unless you turn on the `sh` scheduler. Your mention of haproxy in this context is irrelevant, also. – womble Aug 27 '11 at 04:44
  • @womble, I stand corrected - default behavior for IPVS is *NOT* to be persistent. haproxy is relevant; it's specifically built for HTTP load balancing and does not require special tunneling setups; why use a generic L4 solution if you want more control of the load balancing to maximize HTTP performance? – Wim Kerkhoff Sep 03 '11 at 02:07
  • The question was *specifically* asking about an abberation in the operation of a particular piece of software, which is not beyond it's capabilities. Mentioning other software is, as a result, irrelevant. – womble Sep 03 '11 at 09:34