3

All,

I'm having an issue with HAProxy not recovering after a backend server is restarted. I'm not using the proxy for load balancing, but to direct different URLs to different servers (web and web services) to avoid cross-domain issues. The proxy works fine until the first check fails, but then HAProxy never resumes forwarding to it once it is back up. The proxy is running inside a docker container (https://registry.hub.docker.com/u/dockerfile/haproxy/dockerfile/) and it should be running HAProxy 1.5.3.

haproxy.cfg

global
    log         127.0.0.1 local0
    log         127.0.0.1 local1 notice
    user        haproxy
    group       haproxy

defaults
    mode        http
    log         global
    option      dontlognull
    option      httpclose
    option      httplog
    option      forwardfor
    option      persist
    option      redispatch
    option      http-server-close
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000
    maxconn     60000
    retries     3
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http
    stats enable
    stats uri /haproxy-stats

frontend http-in
    bind *:80

    default_backend       server1
    acl url_server1       path_beg -i     /server1
    acl url_server2       path_beg -i     /services/server2
    use_backend server1   if url_server1
    use_backend server2   if url_server2

backend server1
    balance roundrobin
    option httpchk GET /server1/content/
    server server1 server1:8080 check inter 10s rise 1 fall 5

backend server2
    balance roundrobin
    option httpchk GET /services
    server server2 server2:8181 check inter 5s rise 1

In the logs for HAProxy, I see the following error messages:

[WARNING] 040/210248 (1) : Server server1/server1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 040/210248 (1) : backend 'server1' has no server available!

In the browser I see:

503 Service Unavailable
No server is available to handle this request.

Once I restart the server, I never see a log message indicating that HAProxy has determined it has come back up and I still receive the 503 error in the browser, even though I can hit server1:8080 and see that the site is back up.

Dave M
  • 4,494
  • 21
  • 30
  • 30
  • And is URI `/server1/content/` actually returning a 2xx or 3xx HTTP code in less than 10 seconds on server1:8080 after the restart ? – Xavier Lucas Feb 10 '15 at 22:26
  • Yes. It successfully brings up up the page. – Michael Wooten Feb 11 '15 at 00:04
  • Are the backend services running as containers? What is the networking setup between the backend services and the host that HAProxy is running on? Able to post the command you use to start up the HAProxy container (and the image if it is public)? – Andy Shinn Mar 13 '15 at 16:21

5 Answers5

3

I ran into the same issue and my understanding is that HaProxy considers the whole backend as down if all the servers in the backend are down. And once a backend is marked as down it doesn't go back up (this is not documented, I came to this conclusion based on my experience).

Since you have only one server, it is very likely that your server is marked as down during restart. This would be the correct behavior of a load balancer if you had a multiple servers.

My solution consisted in increasing the interval between checks and the fall parameters:

  • The rise parameter sets the number of checks a server must pass to be declared operational. Default is 2.
  • The fall parameter sets the number of checks a server must fail to be declared dead. Default is 3.
  • The inter parameter sets the interval between these checks. Default is 2000 milliseconds.

For instance:

server foo 1.2.3.4:80 check inter 5000 fall 5 rise 1
Pedro
  • 291
  • 1
  • 3
  • 9
1

We had similar behavior. After running for a month or so the server would be marked as down and would not come back up.

By default HAProxy resolves any hostnames to IPs at startup. If the server changes IP address HAProxy does not resolve the DNS again until a restart. Specifically for us we are AWS and we where pointing at an ALB/ELB. After a month the ALB/ELB would change its IP (most likely some maintenance). At that point the IP is bad and will continue to fail.

The solution for us was to add a resolver to the HAProxy.cfg:

resolvers myresolver
  nameserver dns1 10.110.0.2:53
  resolve_retries       30
  timeout retry         1s
  hold valid           30s

server myserver <server_name>:443 maxconn 32 check ssl verify none resolvers myresolver

HApRoxy Documentation

0

I had the same issue and my services are hosted in kubernetes. It turns out it's a k8s dns issue. I fixed it by adding

resolvers mydns
  parse-resolv-conf
  hold valid 10s
backend your_backend
  ...
  server server_name ip:port resolvers mydns check inter 1s

Instead of manually restarting haproxy, can you do the following to debug:

set up haproxy stats page, click force up for your backend server. And check if your haproxy is back to work.

Yulin
  • 101
0

I ran into the same issue as well. When load balancing RabbitMQ, when one RabbitMQ instance is closed, it is marked as DOWN but after starting the closed instance it is not marked as up. Making initial configuration according to Pedro's post, I made some additions.

  • Added option httpchk or option tcp-check for enabling health check.
  • Specified exactly which port health check uses. (Even if the port is the same port with the service)

server rabbitmq-1 rabbitmq-1:5672 check port 5672 inter 5s rise 2 fall 3

-1

I stumbled across exactly the same issue and I think this is by design. HAProxy has a property fall responsible for this behavior:

fall <count>  
   The "fall" parameter states that a server will be considered as dead after
   <count> consecutive unsuccessful health checks. This value defaults to 3 if
   unspecified. See also the "check", "inter" and "rise" parameters.

I have no clue how to bring back a dead instance. I tried even the interactive console mentioned in this question

bentolor
  • 131
  • 1
  • 8
  • As per the documentation, `fall` and `rise` are counterparts. It is intended to reduce flapping. If a server fails `fall` consecutive checks, it is taken out of the rotation. It needs to report success to `rise` consecutive checks again, before being marked as up. – Daniel Schneller Mar 22 '16 at 10:54