1

Currenty the architecture is organised like this:

  • 192.168.1.10, 192.168.1.11, 192.168.1.12 - mesos master with marathon and consul server
  • 192.168.1.21, 192.168.1.22, 192.168.1.23 - three mesos slave with consul agent

Every machine the following configuration:

  • its own IP address as DNS server
  • HAPROXY with consul-template for automatic generation of haproxy.cfg
  • consul-mesos for registering the running docker containers with its host IP address, in order to avoid the service for having 172.1.0.1 address in the DNS when resolving from different machine

The problem that i am having currently is that when i start docker container with a service in a bridged mode, the address is resolved by the DNS as it should be (if you have luigi.service.consul it will be done without a problem), but when i try to CURL -L http://luigi.service.consul/, i have random 503 error codes, sometimes the address is resolved and sometimes is not.

Any ideas how to investigate this?

I have checked /etc/resolv.conf and from times to times, I can see that the DNS address is changed to the old DNS IP address (8.8.8.8), should i use the host IP address as DNS IP, or i need to use the consul leader address?

I have tried debugging the tcpflow, when the response is 503 it is like waiting some time, like it is failing to resolve the service.

badc0re
  • 113
  • 5

1 Answers1

1

It takes some time to fetch a Docker image, start container and finally start serving requests. Ideally you should reload HAProxy by the time your new instance is ready. But there might be still active connections to your old instance. When you start investigating this issue it turns out that a solution already exists, it's called blue-green deployment as described by M. Fowler.

If you aim for zero-downtime, there won't be a trivial solution. Yelp engineers describe how to reload HAProxy with true zero downtime by adding extra rules to iptables.

Recently blue-green deployment scheme was merged into marathon-lb (including the iptables trick from Yelp). I think that consul-mesos currently doesn't support this.

Tombart
  • 2,013
  • 3
  • 27
  • 47