0

We're running an ESXi 5.5u2 hypervisor that has a mirror of 2 15k SAS drives with a couple of VMs inside and we noticed some services become briefly unreachable if another VM is using all the IO.

So I'm looking into a way to avoid this trickle effect by limiting the IOPS of our VMs in vSphere so each have a slice of the drive's total. Just went to the VM settings Resource tab and set the IOPS limit. Not a difficult task (although I do wish vSphere had a Linux client), but one thing I've noticed is the IOPS limit does not come close to what the VM actually gets.

I'm using a tool called VisualEsxtop which graphs a bunch of things including IOPS. What I am graphing in particular is the CMDS/s, Reads/s, and Writes/s. To find my configuration's total IOPS, I'm running a simple stress test with dd that writes a big file and reads from a big file (both tests run side by side).

From that, I found my IOPS to be around 270. It is to my understanding that IOPS = num_reads + num_writes, but when applying that logic to the IOPS limit in vSphere I am not seeing that connection. Setting the limit to half made the IOPS fall to around 40 and setting it to 2700 gave the the 50% total I wanted.

So I am a little befuddled by this. Is it how I determined my IOPS? Cheers to anyone that can shed some light on this.

Datus
  • 21
  • 1
  • 3
  • 1
    Can you provide some real details on the hardware involved? Disk, type, capacity, RAID controller, server specifications, etc? – ewwhite Jul 24 '15 at 15:01
  • You might actually be slowing down your IOPS by doing the `dd` commands simultaneously, since there are more drive seeks involved in switching back and forth between where the file is being written and where the file is being read from. – austinian Jul 24 '15 at 16:05
  • Noted, I think I will just use just stick to writes then to gauge it. – Datus Jul 24 '15 at 16:37

1 Answers1

4

1) they do have a linux client - the web-client - you know when you log into the .net client - well it says in that little text bit at the top something along the lines of 'stop using this, it's going away, use the web-client' - well, that.

2) As a rule of thumb I assume about 200 random IOPS per 15krpm disk - I know you may well get more but it's a reasonable assumption - of course you're using R1 so writes get a 2:1 penalty and reads may get a small boost but again I wouldn't bank on it. Oh and your definition isn't really right - there are lots of different types of IO benchmarking, IOPS is simply a measure - in basic terms there's sequential-read/write and their random versions too.

3) Rather than limit IOPS why not just set disk shares to be higher on the higher-priotity VM/s - that way you're not limiting the maximum performance of any given VM but just weighting their responses under contention.

4) Given you only have two disks might I suggest that switching to SSDs wouldn't be too expensive and you can forget all this tuning you're trying to do.

5) This may sound harsh but it's not meant that way - get some training, even the basic ICM course teaches these basics.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • Personally, I go with about 180 IOPS for 15k disks, but 200 would generally work too. As a reminder, these are physical limitations of the hardware and don't take into account any read/write caching on the controllers that may come into play here as well. Totally agree on using share priority instead of trying to set hard limits on IOPS. – Rex Jul 24 '15 at 15:04
  • Funny, I used to go with 180 for years too but these newer disks really do push it up a tiny bit - but you're right about the controller quality - especially with SSDs, so many of them get bogged down with crappy ones. – Chopper3 Jul 24 '15 at 15:07
  • 15k SFF SAS drives, I would say 200 work pretty well as the SFF drives tend to be a little more capable in general overall. – Rex Jul 24 '15 at 15:09
  • Appreciate the responses. I know I still have a fair bit to learn, so my journey is still ongoing. :-) Anyways I would adjust the priority instead, but I am in a unique situation. Both VMs have equal priority since they run the same service for redundancy. As for setting the IOPS, setting limits as low as 180 or 200 wields odd results as I noted in my original post. This oddity in particular is what I was curious about. – Datus Jul 24 '15 at 16:17