7

I have a set of servers inside Amazon EC2 in VPC. Inside this VPC I have a private subnet and a public subnet. In the public subnet I have set up a NAT machine on a t2.micro instance that basically runs this NAT script on startup, injecting rules into iptables. Downloading files from the internet from a machine inside the private subnet works fine.

However I compared the download speed of a file on an external high-bandwidth FTP server directly from my NAT machine to the download speed from a machine inside my private subnet (via the same NAT machine). There was a really significant difference: around 10MB/s from the NAT machine vs. 1MB/s when downloading from the machine inside the private subnet.

There is no CPU usage on the NAT machine, so this cannot be the bottleneck. When trying the same test with bigger machines (m3.medium with "moderate network performance" and m3.xlarge with "high network performance"), I also could not get download speeds greater than 2.5MB/s.

Is this a general NAT problem that can (and should) be tuned? Where does the performance drop come from?

Update

With some testing, I could narrow this problem down. When I am using Ubuntu 12.04 or Amazon Linux NAT machines from 2013, everything runs smoothly and I get the full download speeds, even on the smallest t2.micro instances. It does not matter whether I use PV or HVM machines. The problem seems to be kernel-related. These old machines have a Kernel version 3.4.x, whereas the newer Amazon Linux NAT machines or Ubunut 14.XX have Kernel version 3.14.XX. Is there any way to tune the newer machines?

j0nes
  • 945
  • 10
  • 25
  • Generally NAT access is slower than direct access, since it required an additional layer of proxy. You can try downloading on one of your public instance or have a bigger NAT instance (at least t2.medium) – number5 Nov 21 '14 at 04:36
  • Already tried that. The official Amazon NAT AMIs did perform equally or worse, and bigger instances up to m3.xlarge (see above) did not help either. I can live with downloads being slower through NAT, but 10 times slower seems quite a lot to me. – j0nes Nov 21 '14 at 07:51

3 Answers3

6

We finally found the solution. You can fix the download speed by running on the NAT machine (as root):

ethtool -K eth0 sg off

This disables scatter-gather mode, which (as far as I understand this) stops offloading some network work on the network card itself. Disabling this option leads to higher CPU usage on the client as the CPU now has to do the work itself. However on a t2.micro machine we only saw around 5% of CPU usage when downloading a DVD image.

Note that this won't survive a restart, so make sure to set this in rc.local or at least before setting up NAT.

j0nes
  • 945
  • 10
  • 25
  • That's fascinating. Nice find. – Liyan Chang Dec 03 '14 at 21:34
  • Thank you. I have spent many hours trying to debug intermittent network performance issues on a new VPC configuration. My private EC2 instances request files from S3, and the requests would frequently stall for long periods (sometimes over 2 minutes). This was easily reproduced by piping a list of URLs through curl. Running the same requests directly from the NAT instance did not exhibit the same problems. Disabling scatter-gather mode on the NAT instance solved the problem immediately. – Mike Dec 17 '14 at 16:22
  • Thanks for the tip however, I ran the tests mentioned by @Liyan Chang on 2 instances with the scatter-gather ON and OFF and found that turning that option off affected the Download and Upload negatively: Test: `for ((n=0;n<10;n++)); do speedtest-cli --simple --server 935; done` `M1.Small: sg: ON Down: 350 Up: 165` `M1.Small sg: OFF Down: 355 Up: 165` ---------------------- `T2.Medium sg: ON Down: 875 Up: 520` `T2.Medium sg: OFF Down: 812 Up: 317` – Montaro Aug 12 '15 at 20:54
3

I also use NAT boxes in a similar setup in production so very interested in your findings. I haven't had similar findings before production, but maybes it's an issue that I haven't paid attention to before.

Let's do some science!

============================================================================

Theory: NAT boxes can download and upload faster then a client who is using the NAT.

Experiment: Match the questioners experiment. t2.micros with Amazon NAT 2014.09 2 subnets with the NAT going to an IGW and private subnet pointing to the NAT. (Shared Tenancy. General Purpose SSD)

Procedure:

# install speedtest
$ sudo yum install python-pip -y --enablerepo=epel; sudo pip install speedtest-cli
# run against the same server
$ speedtest-cli --server 935 --simple
# run it many times
$ for ((n=0;n<10;n++)); do speedtest-cli --simple --server 935; done

Data:

          Nat:     Client
Download  727.38   157.99
Upload    250.50   138.91

Conclusion: OP is not lying.

============================================================================

Theory: Different kernel versions lead to different results.

Experiment: Set up 3 nat boxes, each with magnetic SSD, m3.medium (no bursting), and dedicated tenancy. Run a speed test.

Procedure: See last experiment. Also, set up a routing table for each NAT box. Used a blackhole routing table to prove that the changes propagated when I swapped routing tables.

  1. Using a NAT.
  2. curl google.com works.
  3. Switch to blackhole.
  4. Wait for curl google.com to fail on the client.
  5. Switch to new NAT.
  6. curl google.com works.

Here are my 3 nat boxes: 2014.09 3.14.20-20.44.amzn1.x86_64 2014.03 3.10.42-52.145.amzn1.x86_64 2013.09 3.4.62-53.42.amzn1.x86_64

Data:

All 3 boxes get very similar results when running speedtest-cli --server 935

09/14   03/14   09/13
355.51, 356.55, 364.04
222.59, 212.45, 252.69

From the client:

09/14   03/14   09/13
351.18, 364.85, 363.69
186.96, 257.58, 248.04

Conclusion: Is there degradation? No. Is there any difference between the kernel versions? No.

============================================================================

Theory: Dedicated versus shared tenancy makes a difference.

Experiment: 2 NAT boxes. Both using NAT 2014.09. One with shared tenancy, one with dedicated tenancy.

Data: Both boxes have similar performance:

Shared Nat   Dedicated Nat
387.67       387.26
296.27       336.89

They also have similar standard deviations:

$ python3
>>> import statistics
>>> shared_download = [388.25, 333.66, 337.44, 334.72, 338.38, 335.52, 333.73, 333.28, 334.43, 335.60]
>>> statistics.stdev(shared_download)
16.858005318937742
>>> dedicated_download = [388.59, 338.68, 333.97, 337.42, 326.77, 346.87, 336.74, 345.52, 362.75, 336.77]
>>> statistics.stdev(dedicated_download)
17.96480002671891

And when you run the 2x2 combinations:

      Shared Client/Sh. NAT  Sh. Client/Dedicated Nat  Ded. Client/Sh. Nat  Ded. Client/Ded. NAT
Upload       290.83                      288.17                283.13              340.94
Download     260.01                      250.75                248.05              236.06

Conclusion: Really unclear the shared versus dedicated doesn't seem to make a big difference.

Meta conclusions:

The test that's probably worth redoing would be OP's test with m3.mediums. I was able to duplicate the t2.micro results, but my m3.medium seems to conflict with OP's m3.medium results.

I'd be interested in seeing your data on kernel versions as well.

Perhaps the most interesting part is how I was unable to get a m3.medium NAT to go quickly.

Liyan Chang
  • 391
  • 2
  • 4
  • Thank you for your experiments! Are you able to reproduce your second test with the Ubuntu 12.04 instances? This is what we use at the moment without any problems. – j0nes Nov 29 '14 at 16:20
  • Just tried to reproduce this, it looks like speedtest-cli simply uses too small files. The effect shows clearly when downloading bigger files (like a Debian image or similar). – j0nes Dec 03 '14 at 11:33
0

My tests showed this made my downloads worse.

m3.large speedtest server m3.medium dedicated NAT server.

No other traffic in this environment.

sg on average Download speed: 292.19 sg off average Download speed: 259.21

My test was: for ((n=0;n<10;n++)); do speedtest-cli --simple ; done

Magd
  • 169
  • 2
  • 9