I also use NAT boxes in a similar setup in production so very interested in your findings. I haven't had similar findings before production, but maybes it's an issue that I haven't paid attention to before.
Let's do some science!
============================================================================
Theory: NAT boxes can download and upload faster then a client who is using the NAT.
Experiment: Match the questioners experiment. t2.micros with Amazon NAT 2014.09
2 subnets with the NAT going to an IGW and private subnet pointing to the NAT.
(Shared Tenancy. General Purpose SSD)
Procedure:
# install speedtest
$ sudo yum install python-pip -y --enablerepo=epel; sudo pip install speedtest-cli
# run against the same server
$ speedtest-cli --server 935 --simple
# run it many times
$ for ((n=0;n<10;n++)); do speedtest-cli --simple --server 935; done
Data:
Nat: Client
Download 727.38 157.99
Upload 250.50 138.91
Conclusion: OP is not lying.
============================================================================
Theory: Different kernel versions lead to different results.
Experiment:
Set up 3 nat boxes, each with magnetic SSD, m3.medium (no bursting), and dedicated tenancy.
Run a speed test.
Procedure:
See last experiment. Also, set up a routing table for each NAT box. Used a blackhole routing table to prove that the changes propagated when I swapped routing tables.
- Using a NAT.
curl google.com
works.
- Switch to blackhole.
- Wait for
curl google.com
to fail on the client.
- Switch to new NAT.
curl google.com
works.
Here are my 3 nat boxes:
2014.09 3.14.20-20.44.amzn1.x86_64
2014.03 3.10.42-52.145.amzn1.x86_64
2013.09 3.4.62-53.42.amzn1.x86_64
Data:
All 3 boxes get very similar results when running speedtest-cli --server 935
09/14 03/14 09/13
355.51, 356.55, 364.04
222.59, 212.45, 252.69
From the client:
09/14 03/14 09/13
351.18, 364.85, 363.69
186.96, 257.58, 248.04
Conclusion:
Is there degradation? No.
Is there any difference between the kernel versions? No.
============================================================================
Theory: Dedicated versus shared tenancy makes a difference.
Experiment:
2 NAT boxes. Both using NAT 2014.09. One with shared tenancy, one with dedicated tenancy.
Data:
Both boxes have similar performance:
Shared Nat Dedicated Nat
387.67 387.26
296.27 336.89
They also have similar standard deviations:
$ python3
>>> import statistics
>>> shared_download = [388.25, 333.66, 337.44, 334.72, 338.38, 335.52, 333.73, 333.28, 334.43, 335.60]
>>> statistics.stdev(shared_download)
16.858005318937742
>>> dedicated_download = [388.59, 338.68, 333.97, 337.42, 326.77, 346.87, 336.74, 345.52, 362.75, 336.77]
>>> statistics.stdev(dedicated_download)
17.96480002671891
And when you run the 2x2 combinations:
Shared Client/Sh. NAT Sh. Client/Dedicated Nat Ded. Client/Sh. Nat Ded. Client/Ded. NAT
Upload 290.83 288.17 283.13 340.94
Download 260.01 250.75 248.05 236.06
Conclusion:
Really unclear the shared versus dedicated doesn't seem to make a big difference.
Meta conclusions:
The test that's probably worth redoing would be OP's test with m3.mediums. I was able to duplicate the t2.micro results, but my m3.medium seems to conflict with OP's m3.medium results.
I'd be interested in seeing your data on kernel versions as well.
Perhaps the most interesting part is how I was unable to get a m3.medium NAT to go quickly.