3

I have set up two dedicated haproxy servers to spread the load on three application servers. I have set up regular http 80 balancing, and also a special to work with websockets.

It works great for about 2 hours, but after that it gets really painfully slow, a page load is about 30 seconds. When I restart haproxy it's good again.

Below is my conf. Any idea what may be causing this?

global
  user haproxy
  group haproxy

defaults
  mode http
  timeout connect 5s
  timeout client  5s
  timeout server  60s
  stats enable
  stats auth aa:bb

frontend proxy
   # listen on 80
  bind 0.0.0.0:80

  # allow for many connections, with long timeout
  maxconn 200000 # total maximum connections, check ulimit as well
  timeout client 24h

  # default to webapp backend
  default_backend webapp

  # is this a socket io request?
  acl is_websocket hdr_end(host) -i node.domain.com
  use_backend websocket if is_websocket

backend webapp
   balance roundrobin # assuming that you don't need stickiness
   # allow client connections to linger for 5s
   # but close server side requests to avoid keeping idle connections
  option httpchk HEAD /check.txt HTTP/1.0
  option http-server-close
  option forwardfor

  server app1 x.y.149.133:80 cookie app1 weight 10 check
  server app2 x.y.149.134:80 cookie app2 weight 15 check
  server app3 x.y.149.135:80 cookie app3 weight 15 check

backend websocket
  balance source

  # options
  option forwardfor # add X-Forwarded-For

  # Do not use httpclose (= client and server
  # connections get closed), since it will close
  # Websockets connections
 no option httpclose

 # Use "option http-server-close" to preserve
 # client persistent connections while handling
 # every incoming request individually, dispatching
 # them one after another to servers, in HTTP close mode
 option http-server-close
 option forceclose

 server app1 x.y.149.133:3000 cookie app1 weight 10 check
 server app2 x.y.149.134:3000 cookie app2 weight 15 check
 server app3 x.y.149.135:3000 cookie app3 weight 15 check
Chrille
  • 463
  • 2
  • 5
  • 10
  • Enable logging in `global` section to see what happens. – quanta Aug 01 '12 at 10:27
  • I added this in global: log 127.0.0.1 local0 and then "log global" in the defaults section. But all I got in the log was lots of connection messages, I can't see anything about errors in there. What might be the cause? When this happens I'm pretty much unable to browse any page so the haproxy stats page was unavailable, but the log shows about 30 requests per second – Chrille Aug 01 '12 at 17:59
  • 1
    This is just some idea I'm throwing out. You say it occurs problematic after about two hours, and that you have about 30 requests per second. Your maxconn is set to 200000 which is surprisingly near the request count for two hours if you have 30 req/s. Maybe you are hitting some sort of connection limit in the OS? Did you check that out? – pkhamre Aug 15 '12 at 11:49
  • 1
    How many active connections were you up to when it started getting slow? – Nick Craver Jun 11 '14 at 11:54
  • Does it make sense to have `option forceclose` follow `no option httpclose`? – KCD Aug 29 '15 at 18:08

1 Answers1

1

What generally makes websockets different from your everyday http loadbalancing is the fact that that you end up with a high concurrent amount of connections as compared to the arrival rate. This is an important distinction in systems, so if it isn't clear to you have a look at this answer of mine.

So, whatever your problem is, my guess is that it occurs when you reach a certain threshold of concurrent connections. Here is my best guess based on the information you provided:

Backend web sockets contains 3 servers. The load balancer talks to them all from the same IP. That means you have a total of source_port_range * destination IPs. This looks something like:

[root@ny-kbrandt01 ~]# cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000
[root@ny-kbrandt01 ~]# echo $(( (61000-32768) * 3 ))
84696

So when you hit somewhere around 84k connections, you haproxy instances is starved for source ports, CPU spikes as it does something akin to garbage collection to find more source ports.

If this isn't, I bet it is something up this alley, monitor your concurrent connections using the haproxy stats page and monitor your cpu to better understand what is happening when things get slow.

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444