Strange IOPS performance on AWS R3.large & R4.large instances

Question

I have used 4 10GB GP2 EBS volumes with RAID0 on Windows Server 2012R2 image as explained here: http://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/raid-config.html The instance type I used was R3.large

I was expecting to see 4*3000 (12K IOPS) when the bursting pool is full but I only get up to 7480 IOPS consistently. That's fine.

After that I changed the instance type to R4.large which is supposed to use a more recent version of CPU (broadwell instead of Ivy Bridge) and most likely, faster. I kept everything else the same, same disks, same OS, same test: The performance was worse than R3.large at around 6480 IOPS.

What is the problem here? Why would a more recent generation of the same instance group (R-"Memory Intensive") perform worse than before?

Possibly nothing you don't already know, but achieving peak IOPS (and throughput) is dependent on sufficiently saturating the volume's capacity by keeping the queue of pending operations sufficiently high. How did those metrics look during the test? It seems conceivable (...maybe?...) that the r4/r3 discrepancy might be attributable to the same underlying cause -- not enough work "in the pipe," exacerbated in one case by the physical topology of the availability zone, causing the round trip times between one instance and the volumes to be larger than the other in the hundreds of µsec range. — Michael - sqlbot, May 23 '17 at 22:45
I'm using diskspd to test IOPS and whenever I'm using this on "IO1" disk type with a set IOPS limit, I get the exact same number. i.e I create a 30GB IO1 disk of 1500IOPS and I do get 1500 IOPS on the test. So I'm pretty confident that tools is generating as many tasks as it takes to saturate the disk. — user2629636, May 24 '17 at 14:44
It sounds like you're using an appropriate tool. I'd be curious what a 12000 PIOPS volume would do, here. :) — Michael - sqlbot, May 24 '17 at 18:10
I've tested with a 12K PIOPS with r4.large and... the result is the same "6480"! So I'm pretty sure it's the instance itself, I'm doing a couple more tests now. — user2629636, May 24 '17 at 20:16

score 2 · Accepted Answer · answered May 24 '17 at 18:06

Your constraint appears to be coming from the network limits on the instance type, not EBS itself.

There's some reading between the lines required, but the EBS Optimized Instances documentation tells an interesting story -- your numbers are actually better than the estimated IOPS that the instance types claim to be able to support.

EBS Optimized instances have two network paths, with one of them dedicated to EBS connectivity, instead of having just one network path shared by all IP traffic in and out of the instance... so although the documentation is not explicit about this, the speeds appear to be the same whether the instance is EBS optimized or not -- with the difference being that for optimized instances, EBS traffic doesn't have to share the same pipe. Total bandwidth to the instance is doubled, with half allocated for EBS and half allocated for everything else.

You mentioned using an r3.large instance, and that's not shown in the table... but if we extrapolate backwards from the r3.xlarge, the numbers there are pretty small.

As noted in the docs, the IOPS estimates are “a rounded approximation based on a 100% read-only workload” and that since the connections at the listed speed are full-duplex, the numbers could be larger with a mix of read and write.

type       network mbits/s mbytes/s estimated peak IOPS

r4.large            400       50        3,000
r4.xlarge           800      100        6,000

r3.large            250       31.25     2,000 (ratio-based speculation)
r3.xlarge           500       62.5      4,000

Testing one of my r3.large by scanning the first 512 MiB of a 500 GiB gp2 volume seems to confirm this network speed. This machine is not EBS Optimized and was not handling any meaningful workload at the time this test was run. This is consistent with my previous observations on the r3.large. My design assumption has been, for some time, that these machines only have about 0.25 Gbit/s of connectivity, but the test seemed worth a repeat. This is, of course, a Linux system but the underlying principles should all hold.

# sync; echo 1 > /proc/sys/vm/drop_caches; dd if=/dev/xvdh bs=1M count=512 | pv -a > /dev/null
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 14.4457 s, 37.2 MB/s
[35.4MB/s]

That looks very much like a ~250 megabit/sec network connection, which, when you need storage throughput, is not a lot of bandwidth. Counterintuitively, if your workload is an appropriate fit for the t2 CPU credit model, you'll actually get better performance from a t2 than you'll get from an r3.

What solution would you recommend here Michael - moving to t2? Increasing to a larger R instance? Would provisioned IOPS io1 volumes be a better fit than RAID'd gp2? Though I note you'd still need more network bandwidth with io1. Curious why a t2.large would get better network performance than an r4.large. — Tim, May 24 '17 at 19:39
So, starting from the data above, I did multiple tests: 1) R4.large with 12K PIOPS: I got 6480 IOPS 2) R3.large with 12K PIOPS: I got 7480 IOPS 3) T2.large with 12K PIOPS: I got 12200 IOPS (not sure where the extra IOPS came from but I'm not complaining) I think I will go with T2.large, my workload is usually IOPS intensive more than anything else. Thanks for your help. — user2629636, May 24 '17 at 20:28
Funny thing is, R3.large actually performs better than R4.large. Go figure! — user2629636, May 24 '17 at 20:30
"Curious why a t2.large would get better network performance than an r4.large." -> I'd like to know that too! — user2629636, May 24 '17 at 20:34
I mentioned the t2 sort as an ironic twist because is indeed counterintuitive. The same EBS benchmark on a 150 GiB gp2 on a t2.large gives me `[139MB/s]`. This machine is in the same availability zone as the r3.large. — Michael - sqlbot, May 24 '17 at 20:44
Be sure you understand [how `t2` instances work](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html). The `t` instance family gives you a price break because you're not allowed to run the CPU cores at 100% 24x7. But unless/until you run out of CPU credits, they run at full capacity and are quite nice. On the lower end of instances they are generally an excellent value as long as you familiarize yourself with the model and use them for appropriate workloads. — Michael - sqlbot, May 24 '17 at 20:45

Strange IOPS performance on AWS R3.large & R4.large instances

1 Answers1