5

Network Traffic

We run Docker Swarm with 3 manager nodes and 16 worker nodes. The network I/O between two of the three manager nodes is very high. To illustrate this, here is the output from iftop for the three manager nodes:

vm71 (10.0.0.131)

vm71                 => 10.0.0.130            39.3Mb  47.5Mb  49.3Mb
                     <=                        802Kb  1.03Mb  1.07Mb

vm70 (10.0.0.130)

vm70                 => 10.0.0.131             798Kb  1.00Mb  1.00Mb
                     <=                       40.9Mb  44.8Mb  44.8Mb

vm75 (10.0.0.135)

vm75                 => 10.0.0.131            9.50Kb  10.1Kb  9.51Kb
                     <=                       7.83Kb  8.08Kb  7.56Kb

As you can see above, the traffic between vm70 and vm71 is approximately 4,000 times as heavy as the traffic between vm75 and the other two managers. We have our rules set to run no containers on any swarm manager. This was confirmed by running docker stats on each of them.

Networking by Process

The next obvious question was which processes were generating this network I/O. The output of netstat -tup is below. I'm only displaying the line related to the relevant ports from iftop.

tcp6       0     46 vm71:2377               10.0.0.130:39316        ESTABLISHED 791/dockerd

Notice that this is tcp6 traffic.

We're stumped. Why are we seeing so much traffic between these manager nodes? If we demote then promote the manager nodes, the traffic clears up for a while. But, it eventually increases again. What might be causing this?

nthdesign
  • 51
  • 2

1 Answers1

0

I would suggest capturing network reqeust with tcpdump, you will need a monitoring tool/script to watch for when the time has come and execute it it. Frequent culprit is sudden memory increase correlation, if main manager is having sudden big memory usage increase it is explainable. Looking on your output this just looks to be request that hits manager 2 while loadbalanced and the transfer to leader.