14

I'm running a set of load tests to determine the performance of the following setup:

Node.js test suite (client) --> StatsD (server) --> Graphite (server)

In short, the node.js test suite sends a set amount of metrics every x seconds to a StatsD instance which is located on another server. StatsD then in turn flushes the metrics every second to a Graphite instance located on the same server. I then look at how many metrics were actually sent by the test suite and how many were received by Graphite to determine the packet loss between the test suite and Graphite.

However I noticed that I sometimes got very large packet drop rates (note that it's being sent with the UDP protocol), ranging from 20-50%. So that's when I started looking into where these packets were being dropped, seeing as it could be some performance issue with StatsD. So I started logging the metrics in every part of the system to track down where this drop occured. And this is where things get weird.

I'm using tcpdump to create a capture file which I inspect after the test is done running. But whenever I run the tests with tcpdump running, the packet loss is almost nonexistent! It looks like tcpdump is somehow increasing the performance of my tests and I can't figure out why and how it does this. I'm running the following command to log the tcpdump messages on both server and client:

tcpdump -i any -n port 8125 -w test.cap

In one particular test case I'm sending 40000 metrics/s. The test while running tcpdump has a packet loss of about 4% while the one without has a packet loss of about 20%

Both systems are running as Xen VM's with the following setup:

  • Intel Xeon E5-2630 v2 @ 2.60GHz
  • 2GB RAM
  • Ubuntu 14.04 x86_64

Things I already checked for potential causes:

  • Increasing the UDP buffer receive/send size.
  • CPU load affecting the test. (max. load of 40-50%, both client and server side)
  • Running tcpdump on specific interfaces instead of 'any'.
  • Running tcpdump with '-p' to disable promiscuous mode.
  • Running tcpdump only on server. This resulted in the packet loss of 20% occuring and seems to not impact the tests.
  • Running tcpdump only on the client. This resulted in increased performance.
  • Increasing netdev_max_backlog and netdev_budget to 2^32-1. This made no difference.
  • Tried every possible setting of promiscuous mode on every nic (server on and client off, server off and client on, both on, both off). This made no difference.
Ruben Homs
  • 149
  • 7
  • 3
    One thing that tcpdump does by default is put your network interface into promiscuous mode. You might want to pass the `-p` option to skip doing that to see if it makes a difference. – Zoredache Apr 30 '15 at 16:49
  • So you're running tcpdump on both the client and on the server, and the packet loss rate drops? What happens if you run it only on the client, and what happens if you run it only on the server? (And, yes, also try turning promiscuous mode off, and perhaps also try capturing on the specific network interface used for the test rather than the "any" device, to see if *that* makes a difference.) –  Apr 30 '15 at 19:41
  • Thanks for your comments. I tried both of your recommendations and edited my question to reflect what I tried, but this did not affect the problem. – Ruben Homs May 01 '15 at 08:39
  • Does putting nics on both machines to promiscuous mode have the same effect as running tcpdump? `ifconfig eth0 promisc` enables and `ifconfig eth0 -promisc` disables promiscuous mode on eth0. If it makes difference, try comparing the 4 possible combinations of promisc on/off on both machines. That might help pinpoint the source of the problems. – Fox May 04 '15 at 08:52
  • @Fox Thanks for the reply! I tried all possible combinations for all nic's, but with no difference in results. I updated my question to reflect this. – Ruben Homs May 04 '15 at 09:57

4 Answers4

12

When tcpdump is running, it will be fairly prompt at reading in the incoming frames. My hypothesis is that the NIC's packet ring buffer settings may be a bit on the small size; when tcpdump is running it is getting emptied in a more timely manner.

If you're a Red Hat subscriber, then this support article is very useful Overview of Packet Reception. It has some things in there that I don't think you've considered yet.

Consider how your system is dealing with IRQs; consider increasing the 'dev_weight' of the network interface (meaning more packets read from NIC to user-space); look at how often the application reads the socket (can it use a dedicated thread, are there known issues/workaround regarding scalability).

Increase NIC frame buffer (using the ethtool command -- look at the --set-ring etc. arguments).

Look at 'receive side scaling' and use at least that many receive threads to read in the traffic.

I wonder if tcpdump is doing something cool such as using the kernel support for packet ring buffers. That would help to explain the behaviour you are seeing.

Cameron Kerr
  • 3,919
  • 18
  • 24
  • Since this is a Xen environment, you should probably do (at least some of) that on the Xen host. – Cameron Kerr May 04 '15 at 10:36
  • This is something I hadn't thought of before, very interesting stuff, thanks! I will try this once I get access to the Xen host and will let you know how that goes. – Ruben Homs May 04 '15 at 10:55
2

What power governor are you using? I've seen similar behaviors with "ondemand" or "conservative" governor.

Try to use the "performance" governor and to disable any powersaving features in the server BIOS.

Does it change something?

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • I'm having trouble finding out what power governor I'm using. I tried running `cpufreq-info` but get a message saying `no or unknown cpufreq driver is active on this CPU`. Also when using `cpupower frequency-info` it returns `no or unknown cpufreq driver is active on this CPU`. Though I can't confirm this at the moment, the [VM manufacturer's website](http://support.citrix.com/article/CTX200390) leads me to believe it's running on "performance" mode since I have an intel cpu.. – Ruben Homs May 04 '15 at 09:45
  • Can you show the output of the following commands? 1) `cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor` 2) `cat /proc/cpuinfo` 3) `lsmod | grep cpu` – shodanshok May 04 '15 at 12:58
  • [Here you go](https://gist.github.com/RubenHoms/e532e516310542f80ab4) – Ruben Homs May 04 '15 at 13:42
1

Another way is ip_conntarck module, Are you sure your linux-box can accept new connection? test via :

root@debian:/home/mohsen# sysctl net.ipv4.netfilter.ip_conntrack_max
net.ipv4.netfilter.ip_conntrack_max = 65536
root@debian:/home/mohsen# sysctl  net.ipv4.netfilter.ip_conntrack_count
net.ipv4.netfilter.ip_conntrack_count = 29

You have to test

net.ipv4.netfilter.ip_conntrack_max >  net.ipv4.netfilter.ip_conntrack_count

if max == count , your maximum connection is full and your linux-box can't accept new connection.
If you don't have ip_conntrack, you can load easily via modprobe ip_conntrack

PersianGulf
  • 596
  • 6
  • 21
  • 2
    And if this is the case, then you should look at the NOTRACK target in the 'raw' table to prevent connection tracking for that. I did that recently for a busy DNS server and it removed iptables from being the bottleneck and causing DNS resolution timeouts. – Cameron Kerr May 05 '15 at 08:17
  • And here is an example of how I used the NOTRACK rules to have IPTables not perform any connection tracking for UDP DNS. http://distracted-it.blogspot.co.nz/2015/05/iptables-firewall-rules-for-busy.html – Cameron Kerr May 05 '15 at 10:18
1

I suspect the receiving side is simply not capable of handling the packet rate and here's why:

  1. using tcpdump on the client reduces the packets dropped: tcpdump is slowing down the client and therefore the server is seeing a much lower packer rate which it can still partially handle. You should be able to confirm this hypothesis by checking the RX/TX packet counters on both client and server

  2. you mentioned that you increased the UDP buffer receive/send size, could you detail how? It is important that on the server you change both rmem_max and rmem_default, example: sysctl -w net.core.rmem_max=524287 sysctl -w net.core.wmem_max=524287 sysctl -w net.core.rmem_default=524287 sysctl -w net.core.wmem_default=524287

Testing your settings

Stop statsd and the node application, then with the systems idle use iperf to test the packet rate that the network/kernel can handle. If you can stream 40K packets/s with iperf but can't with statsd then you should concentrate your efforts on tuning statsd.

Other tunables

Also remember to tune net.core.netdev_max_backlog: maximum number of packets allowed to queue when a particular interface receives packets faster than the kernel can process them.

unicoletti
  • 141
  • 2