2

On Centos 7.4 I am setting up a swarm where I want to run multiple routers all reachable on port 80/443.
The purpose is to host multiple environment (test/staging...) on a single swarm, all symmetrically.

I am using Docker 17.12.0-ce and Traefik v1.4.6 as router.

The basic idea is to have a virtual IP address per environment and publish Traefik ports only on that address. This is impossible with Docker swarm, so I have to resort to have the Traefik instances listen on ports 81/82 etc and somehow bring the traffic from VIP:80 to :81/:82.

Virtual IP addresses for all the environments across the swarm managers are handled by Keepalived.

Relevant docker service config for Traefik:

"Ports": [
          {
           "Protocol": "tcp",
           "TargetPort": 80,
           "PublishedPort": 81,
           "PublishMode": "ingress"
          },

# netstat -anp|grep 81
tcp6       7      0 :::81                   :::*                    LISTEN      4578/dockerd        

firewalld is set up to allow traffic to ports 80, 81, 82, etc

Accessing the backend services exposed by Traefik directly on port 81 on the VIP works.

Accessing port 80 on the VIP when nothing is configured on it corretly leads to connection refused

The Traefik docker instance is running on the same host I'm using for the following tests.

I first tried with basic DNAT:

firewall-cmd --add-forward-port=port=80:proto=tcp:toport=81:toaddr=127.0.0.1

This led to timeouts, no connection appeared established on the server and tcpdump told me SYNs are ignored

next I tried with a little more specific DNAT:

firewall-cmd --add-rich-rule='rule family=ipv4 forward-port port=80 protocol=tcp to-port=81 to-addr=127.0.0.1'

with the same results.

I discovered GORB which seems tailored to my use case, and provisioned it with

Service:

{
  "host": "<VIP>",
  "port": 80,
  "protocol": "tcp",
  "method": "rr",
  "persistent": true,
  "flags": "sh-port"
}

Backend for said service:

{
  "host": "<VIP>",
  "port": 81,
  "method": "nat",
  "weight": 100,
  "pulse": {
     "type": "tcp",
     "interval": "30s",
     "args": null
  }
}

I verified the setup using ipvsadm and it seems correct:

# ipvsadm -l -n 
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP         <VIP>:80 rr (flag-2)
  ->        <VIP>:81              Masq    100    0          0     

in this case, while no connection appeared on the server, tcpdump showed SYN, SYNACK and ACK being exchanged, followed by the HTTP request and its ACK.
No other traffic passed and the request ultimately timed out on the client side.
ipvsadm registered the connection as active.

If I set up HAProxy to listen on VIP:80 and to proxy the requests via HTTP to 127.0.0.1:81 everything works, but I'd like to avoid it, as it requires all data to pass thru HAProxy, wasting resources for nothing and requiring local configuration.

I'm out of ideas and I don't know how to further troubleshoot.

EDIT for clarification. My question is:
Is it possible to route traffic from VIP:80 to :81/:82 etc without using HAProxy or another process that would simply pump data to the real router (Traefik)?

Seemone
  • 23
  • 1
  • 4

2 Answers2

5

We had a need to publish separate docker swarm services on the same ports, but on separate specific IP addresses. Here's how we did it.

Docker adds rules to the DOCKER-INGRESS chain of the nat table for each published port. The rules it adds are not IP-specific, hence normally any published port will be accessible on all host IP addresses. Here's an example of the rule Docker will add for a service published on port 80:

iptables -t nat -A DOCKER-INGRESS -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.18.0.2:80

(You can view these by running iptables-save -t nat | grep DOCKER-INGRESS).

Our solution is to publish our services on different ports, and use a script that intercepts dockerd's iptables commands to rewrite them so they match the correct IP address and public port pair.

For example:

  • service #1 is published on port 1080, but should listen on 1.2.3.4:80
  • service #2 is published on port 1180, but should listen on 1.2.3.5:80

We then configure our script accordingly:

# cat /usr/local/sbin/iptables
#!/bin/bash

REGEX_INGRESS="^(.*DOCKER-INGRESS -p tcp) (--dport [0-9]+) (-j DNAT --to-destination .*)"
IPTABLES=/usr/sbin/iptables

SRV_1_IP=1.2.3.4
SRV_2_IP=1.2.3.5

ipt() {
  echo "EXECUTING: $@" >>/tmp/iptables.log
  $IPTABLES "$@"
}

if [[ "$*" =~ $REGEX_INGRESS ]]; then
  START=${BASH_REMATCH[1]}
  PORT=${BASH_REMATCH[2]}
  END=${BASH_REMATCH[3]}
  
  echo "REQUESTED: $@" >>/tmp/iptables.log

  case "$PORT" in
     '--dport 1080') ipt $START --dport 80 -d $SRV_1_IP $END; exit $?; ;;
     '--dport 2080') ipt $START --dport 80 -d $SRV_2_IP $END; exit $?; ;;
                  *) ipt "$@"; exit $?; ;;
  esac
fi

echo "PASSING-THROUGH: $@" >>/tmp/iptables.log

$IPTABLES "$@"

N.B. The script must be installed in dockerd's PATH ahead of your distribution's iptables command. On Debian Buster, iptables is installed to /usr/sbin/iptables, and dockerd's PATH has /usr/local/sbin ahead of /usr/sbin, so it makes sense to install the script at /usr/local/sbin/iptables. (You can check dockerd's PATH by running cat /proc/$(pgrep dockerd)/environ | tr '\0' '\012' | grep ^PATH).

Now, when these docker services are launched, the iptables rules will be rewritten as follows:

iptables -t nat -A DOCKER-INGRESS -d 1.2.3.4/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.18.0.2:1080
iptables -t nat -A DOCKER-INGRESS -d 1.2.3.5/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.18.0.2:2080

The result is that requests for http://1.2.3.4/ go to service #1, while requests for http://1.2.3.5/ go to service #2.

The script can be customised and extended according to your needs, and must be installed on all nodes to which you will be directing requests, and customised to that node's public IP addresses.

NewsNow1
  • 66
  • 1
  • 2
2

1st, you can use multiple IP's on the host if you have the ability to add IP's on the real network. This does work in Swarm on Linux. See macvlan docs and google around for "macvlan swarm".

2nd, you're using overlay and swarm's ingress network right?

3rd, Most people just put Traefik (or my fav http://proxy.dockerflow.com) to listen on 80/443 and it routes to the proper service/stack in the Swarm based on host header. Like Florin asked, why aren't you trying that?

Bret Fisher
  • 3,963
  • 2
  • 20
  • 25
  • `1. Thanks for pointing me in the macvlan direction, it flew over my head. I'm currently experimenting with it. ` `2. yes, currently Traefik has published ports on the ingress network and talks with backend services over an overlay network. ` `3. as I said in the reply to Florin, I want to keep Traefik instances for different environment separate. May I ask you why do you prefer DockerFlow to Traefik? (asking to increase my knowledge, I am currently a newbie in the docker swarm world)` – Seemone Jan 24 '18 at 13:54
  • #1 is that last I checked, Traefik requires running the proxy on Swarm Managers, which I prefer to keep isolated and just doing Manager cluster stuff. With Docker Flow Proxy, you put the HAProxy on Workers, and a small agent on 1 or more managers to update the HAProxy. #2, To be highly available, Docker Flow Proxy doesn't require a data store, and is easier to setup for high availability then Traefik. But, Traefik has a lot more users and supports more than just Swarm. Flow Proxy is designed just for Swarm. – Bret Fisher Jan 24 '18 at 17:29
  • Thanks to you I think I'm on the right track. With macvlan I was indeed able to assign an address on the same network as the host one, thus effectively publishing the container directly. To do this I had to remove the mac spoofing filter that RHEV sets on VNICProfiles by default (this is for you, DenverCoder9), otherwise the arp replies from the container would get dropped by the vnic. I am still investigating on how to assign a **specific** address (I will run Traefik with replica=1 to correctly handle stickyness for a legacy application). Do you have any idea on how to do this in a swarm? – Seemone Jan 25 '18 at 08:57
  • macvlan has proved a deadend. It doesn't let you choose the IP and each node must have separate ranges for it to work. I am now trying with keepalived own ipvs driver (virtual_server directive) but I keep getting connection refused from outside despite having what it seems a correct configuration. SElinux is not the culprit. Nor is DNS ;) – Seemone Feb 02 '18 at 17:16