I'm running a set of load tests to determine the performance of the following setup:
Node.js test suite (client) --> StatsD (server) --> Graphite (server)
In short, the node.js test suite sends a set amount of metrics every x seconds to a StatsD instance which is located on another server. StatsD then in turn flushes the metrics every second to a Graphite instance located on the same server. I then look at how many metrics were actually sent by the test suite and how many were received by Graphite to determine the packet loss between the test suite and Graphite.
However I noticed that I sometimes got very large packet drop rates (note that it's being sent with the UDP protocol), ranging from 20-50%. So that's when I started looking into where these packets were being dropped, seeing as it could be some performance issue with StatsD. So I started logging the metrics in every part of the system to track down where this drop occured. And this is where things get weird.
I'm using tcpdump to create a capture file which I inspect after the test is done running. But whenever I run the tests with tcpdump running, the packet loss is almost nonexistent! It looks like tcpdump is somehow increasing the performance of my tests and I can't figure out why and how it does this. I'm running the following command to log the tcpdump messages on both server and client:
tcpdump -i any -n port 8125 -w test.cap
In one particular test case I'm sending 40000 metrics/s. The test while running tcpdump has a packet loss of about 4% while the one without has a packet loss of about 20%
Both systems are running as Xen VM's with the following setup:
- Intel Xeon E5-2630 v2 @ 2.60GHz
- 2GB RAM
- Ubuntu 14.04 x86_64
Things I already checked for potential causes:
- Increasing the UDP buffer receive/send size.
- CPU load affecting the test. (max. load of 40-50%, both client and server side)
- Running tcpdump on specific interfaces instead of 'any'.
- Running tcpdump with '-p' to disable promiscuous mode.
- Running tcpdump only on server. This resulted in the packet loss of 20% occuring and seems to not impact the tests.
- Running tcpdump only on the client. This resulted in increased performance.
- Increasing netdev_max_backlog and netdev_budget to 2^32-1. This made no difference.
- Tried every possible setting of promiscuous mode on every nic (server on and client off, server off and client on, both on, both off). This made no difference.