1

I am helping to benchmark hardware for a new SQL Server instance, and the volume presented to the OS for the data files is carved from a set of spindles on a Symmetrix SAN. The server has yet to have SQL Server installed, so the only activity on the box is our benchmarking.

Now, our storage engineers say that this volume and it's resources are dedicated to our new server (I don't have access to see the actual SAN config) however the performance benchmarks are troubling. For example, the numbers look good until suddenly, and randomly, we see in our IO benchmarking tool wait times of 100 seconds, and disk queue lengths of 255 in perfmon.

This SAN has an 8 GB cache, plus there are other applications besides ours that use the SAN. I'm wondering if (even though the spindles for our volumes should be dedicated to us) the cache may be getting hammered during the performance testing, or perhaps the spindles our volumes are on aren't really dedicated to us.

We're not getting much traction from our storage engineers in helping us track down the problem, so if anybody has experience with diagnosing a problem like this and would like to share insights and troubleshooting methodologies, I'd appreciate it.

arcain
  • 229
  • 1
  • 3
  • 9

2 Answers2

2

@arcain

What benchmark tools are you using?

A number of tools out there such as SQLIOSim and SQLIO, depending on how they are configured, can cause you're disk queue length to reach high levels, but that's ok since it's part of the test. The problem is the wait times you mentioned, to me, that's a dead give away of disk contention. With the exception of SAN fabric saturation cause by far too many VM's, my experience is that the disks (when not SSD's) are always the biggest bottle neck. That having been said, they should not typically result in wait states unless they are shared with another host that is actively utilizing them.

In these situations I suggest using SQLIO from MS (that is if this isn't what you're already using). Being that your Symmetrix has an 8GB cache, I would test with a 16GB or larger test file to push through the cache to ensure the variation isn't so much the cache but rather what the underlying disks are actually putting out.

You can use SQLIO to create the file for a battery of testing and then execute the following against it:

sqlio -kW -t2 -s120 -dE -o4 -fsequential -b64 -BH -LS Testfile.dat

The -d parameter refer's the the drive letter, in this case E:

This test performs a set of sequential writes over a 2 minute (-s120) period and you could wrap in up in a simple batch file with timestamps to help you track the time of day and pipe the results to a log file for review. Write the batch to execute the above or something similar repeatedly for a period of time and review the results. The larger the sample the better.

If the disks are actually dedicated, and since you're pushing through your cache, the resulting IO's, MB's and latency should be pretty close (1-5% variation). If it goes beyond that to more like 15% or higher then you probably have disk contention.

Another thing, be sure to note the date and times of your testing and against which drives you tested against. All SAN's have some sort of logging for profiling the following:

  • Fast writes - writes cached and then later flushed to disk
  • Delayed Fast Writes - a situation where your global cache for writes is saturated and needs to flush some prior writes to disk to make room for the new incoming write requests
  • Read Hits - reads that were satisfied by the global cache
  • Read Misses (Long and Short) - the read was either partially satisfied with the global cache (short) or not at all and it had to pull everything from disk (long)

You should be able to request this information to help gauge your Symmterix's performance since this is a SAN shared with other hosts.

Post back with some results if you can I will be more then happy to review them.

artofsql
  • 396
  • 1
  • 3
  • I'm using SQLIO 1.5.SG, but not those exact options. I changed them to what you recommended, and I'm seeing variance in runs between 20% and 40%. For example, here's are the cumulative data of two typical runs: **RUN 1** IOs/sec: 681.43, MBs/sec: 42.58, Min_Latency(ms): 1, Avg_Latency(ms): 11, Max_Latency(ms): 2977 **RUN 2** IOs/sec: 1018.44, MBs/sec: 63.65, Min_Latency(ms): 1, Avg_Latency(ms): 7, Max_Latency(ms): 1795. Thanks for the info; this gives me some good ammo to take to the storage admins. I'll be sure to ask for the stats. – arcain Jan 04 '11 at 05:05
  • Just curious, but where did the "15% or higher [variance probably means] disk contention" metric come from? I mean, that makes total sense to me, but I've never seen it quantified. Is it just that the base performance should simply be much better than what I'm seeing? – arcain Jan 04 '11 at 05:12
  • When compared to each other, those results definitely show contention. Something else is hitting those drives resulting in the lower IO's per sec and an increase in latency. It's been my experience having worked on IBM and EMC Clarion SAN's that the disk are never limited by the fabric between the host and the disks but rather the bottleneck is always the disks, meaning that the other hardware involved usually doesn't impede the disk performance. When everything is setup correctly dedicated disk perform very consistently. The 15% isn't a hard rule but it comes from years of past experience. – artofsql Jan 04 '11 at 13:12
0

The cache will be shared between you and the other servers which are on the array. It is possible that the spindles are shared with something else. Only your storage admin would know for sure.

It could also be that the shelf or the back end loop is at capacity (not likely, but possible).

If everything is cranking along, then things go to crap for a little bit, then go back to normal, it sounds like you a filling the cache on the array and you are starting to write directly to disk which is a problem since the disks are already running at 100%. How many spindles are behind the LUN? What sort of performance are you seeing when things are working correctly?

mrdenny
  • 27,074
  • 4
  • 40
  • 68
  • I'm not sure how many spindles are in the LUN since our SAN guys typically allocate 9 GB hypervolumes and then string those together into (if I remember correctly) 147 GB metavolumes, and then they chain those to make the LUN. If I could see the actual SAN config, I could tell you. Best case, they formed all the metas from hypers in a sane fashion. Worst case, the hypers are scattered from hell to breakfast. As for performance when things are working correctly, I can't say any of the storage I've used performs optimally. Trying to build a case to prove that things have to change. – arcain Jan 04 '11 at 05:22
  • If they are allocating the hypervolumes then stringing all those together you are probably sharing disks. If not, there isn't really any reason I can think of to create the LUN in this way, other than the "we've always done it this way" reason. – mrdenny Jan 05 '11 at 01:09
  • I'm with you; it doesn't make any sense to me either, plus my group's getting "this is how we do it here, I don't see a problem" quite often. I mean, really? No issues at all? C'mon. – arcain Jan 05 '11 at 04:19
  • Without your storage team being willing to help out with the performance troubleshooting there isn't going to be much that you can do to resolve this. I know this isn't the best news, but when teams don't play nice nothing gets done. – mrdenny Jan 05 '11 at 06:02