1

I have tested / been testing a server cluster locally for quite a while with no problem. I have recently set my server cluster up for a live test, and I have noticed problems, and believe that the HAProxy in my cluster may be running into some problems.

First I will go over a little bit of the structure of the cluster, maybe there is a problem with how I have them setup, maybe I will need multiple proxies.

I have two server clusters the HAProxy is balancing. We will call them SC1 and SC2. The main cluster is SC1, anything on port 80 for the HAProxy will be sent to SC1. SC1 will process the request, and send another request to SC2 through the proxy on port 8080. I wouldn't think this would be a problem, but I notice on my logs on my server often say SC1 cannot connect to SC2, I believe this is because my HAProxy is being overloaded.

The reason I am thinking the HAProxy is being overloaded is because when I look at my stats page, it often takes > 1sec to respond. Because of this I decided to take a look at the HAProxy logs. I have noticed an abnormality in the logs, that I believe may be linked to my problems. Every minute or so(sometimes more sometimes less), I get the following message:

Oct  8 15:58:52 haproxy rsyslogd-2177: imuxsock begins to drop messages from pid 3922 due to rate-limiting
Oct  8 15:58:52 haproxy kernel: [66958.500434] net_ratelimit: 2997 callbacks suppressed
Oct  8 15:58:52 haproxy kernel: [66958.500436] nf_conntrack: table full, dropping packet

I was wondering what the repercussions of this were. Would this just cause dropped packets, or could this cause delays as well? How can I fix this problem? I am running on Ubuntu 12.04LTS Server.

Here are my sysctl modifications:

fs.file-max = 1000000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1

Here is my config file:

global
   log /dev/log   local0 info
   log /dev/log   local0 notice
   maxconn 50000
   user u1
   group g1
   #debug

defaults
   log     global
   mode    http
   option  httplog
   option  dontlognull
   option  forwardfor
   retries 3
   option redispatch
   option http-server-close
   maxconn 50000
   contimeout      10000
   clitimeout      50000
   srvtimeout      50000
   balance roundrobin

listen  sc1 255.255.255.1:80
    maxconn 20000
    server  sc1-1 10.101.13.68:80 maxconn 10000
    server  sc1-2 10.101.13.66:80 maxconn 10000
listen  sc1-1_Update  255.255.255.1:8181
    maxconn 20000
    server  sc1-1 10.101.13.66:80 maxconn 20000
listen  sc1-2_Update  255.255.255.1:8282
    maxconn 20000
    server  sc1-2 10.101.13.68:80 maxconn 20000
listen  sc2 255.255.255.1:8080
    maxconn 30000
    server  sc2-1 10.101.13.74:80 maxconn 10000
    server  sc2-2 10.101.13.78:80 maxconn 10000
    server  sc2-3 10.101.13.82:80 maxconn 10000
listen  sc2-1_Update 255.255.255.1:8383
    maxconn 30000
    server  sc2-2 10.101.13.78:80 maxconn 15000
    server  sc2-3 10.101.13.82:80 maxconn 15000
listen  sc2-2_Update 255.255.255.1:8484
    maxconn 30000
    server  sc2-1 10.101.13.74:80 maxconn 15000
    server  sc2-3 10.101.13.82:80 maxconn 15000
listen  sc2-3_Update 255.255.255.1:8585
    maxconn 30000
    server  sc2-1 10.101.13.74:80 maxconn 15000
    server  sc2-2 10.101.13.78:80 maxconn 15000
listen  stats :8888
    mode http
    stats enable
    stats hide-version
    stats uri /
    stats auth user:pass

The sc1 and sc2 are the main clusters. All of the other ones I use when I have to update my servers(forward port 80 to 8181 on the haproxy for example to update server sc1-1).

Any help with this issue would be greatly appreciated.

Thank you

Eumcoz
  • 217
  • 3
  • 8

1 Answers1

2

It looks like your connection tracking table is filling up. Removing iptables rules which use connection tracking would solve the problem.

If that is not an option and you have RAM available you can increase the table size:

cat /proc/sys/net/netfilter/nf_conntrack_max
echo 131072 > /proc/sys/net/netfilter/nf_conntrack_max

You should probably increase the hashsize as well:

cat /sys/module/nf_conntrack/parameters/hashsize
echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

Those numbers are just double the default settings on my desktop, I'm not sure what exactly you would need. You'll also want to add that to sysctl.conf.

I would be really careful using net.ipv4.tcp_tw_recycle it can cause serious problems with NAT.

JHill
  • 156
  • 1