MSSQL I/O performance degraded after SAN consolidation?

Question

Recently I have consolidated all our Dell Equallogic SANs into the same group; previously each SAN was in its own group. They are all populated with 15k RPM SAS drives in RAID 6 so I haven't bothered to tier the storage of the new consolidated group as they're basically all the same.

In the process of doing so, I have changed all our VMs to use VMDK storage instead of iSCSI because I believe the performance to be better.

I'm being told now that the disk I/O performance of our MS SQL 2005 server (our main SQL box, for now) has been consistently worse than prior to performing these operations, but I can't see how that could be... its disks (C - OS, D - MDFs, E - LDFs) are now spanned across way more read heads than they were previously, and my understanding is that VMDK storage is more performant than iSCSI.

So what gives? Here's a graph of the "total I/O wait time" from Solarwinds Database Performance Analyzer:

JimNim · Answer 1 · 2016-09-07T21:26:50.917

The first thing to keep in mind with combining those EQL arrays into a single pool is that workload on each volume has the potential to affect performance on other volumes. It's possible that your SQL database - though residing on more physical spindles now - has more resource contention due to other workloads sharing the same spindles.

The second major factor that comes to mind is the storage network. With members in separate pools or groups, almost all of your iSCSI network traffic is from I/O to/from the hosts. With members in a single group and pool though, you must account from intra-group traffic - mainly, page movement. Page movement keeps the in-use capacity even between members, and also balances "hot" data to members with relatively lower workloads. Check out the white paper on Equallogic Load Balancers for some more in-depth info.

This increase in traffic could easily exceed what your switches are capable of if they don't meet the standards described in the Dell Storage Compatibility Matrix (see p.19)

You may also want to read through the best practices whitepaper for VMware and Equallogic to ensure your configuration isn't the cause of trouble.

Some questions:

Do you have an active warranty on any of the arrays? If so, this is really something you should get input from support on - tons of performance-savvy resources available to assist.

I don't have active warranty on any of the arrays unfortunately.
Do you have SAN Headquarters installed and monitoring the group? If not... get it installed and configured (assuming you have a warranty and can obtain it). It provides some crucial insight into many of the storage performance metrics you need to understand potential root causes.

I do have SAN HQ, though... can you elaborate on what I should be looking at within it to help pin this down?

The easiest place to check is in "experimental analysis", which gives you a graph of your workload compared to an "estimated max IOPS". You can view this for the whole group, and for individual members. You can also see individual spindle IOPS and queue depth in the hardware section, though it can be tough to tell by those numbers alone whether the spindles are being overworked.

How many members/arrays do you have in the same pool now?

There are 5 arrays in the same pool now

I would strongly recommend you consider splitting them into two pools, with no more than 3 members in a pool. A volume is only distributed between 3 members when it's not in the middle of re-balancing capacity to a different member (which will happen frequently on volumes with snapshots constantly changing in-use space). Cutting things down to 3 members max will stop a lot of "churn" from entire volume slices being rebalanced between members in an endless chase after getting in-use capacity as equal as possible between members.

Outside of all that info... if you can't get to the bottom of things on your own, you might consider just paying for a support ticket with Dell to have someone walk through everything in the environment with you to isolate the cause.

Hi @JimNim, I don't have active warranty on any of the arrays unfortunately. I do have SAN HQ, though... can you elaborate on what I should be looking at within it to help pin this down? There are 5 arrays in the same pool now. — NaOH, Sep 07 '16 at 21:10
Added late question... were the arrays all configured with RAID6 before the config change? — JimNim, Sep 17 '16 at 08:42

score 3 · Answer 2 · answered Sep 09 '16 at 07:56

The performance difference between VMDK and block-level iSCSI depend on workload type and may differ a lot from app to app. I would strictly recommend you to perform a test like running some of your apps on both types of storage access protocol and see how it behaves. Since VMDK is an additional layer between app and storage it might be slower if the host controlling the virtual drive is heavily loaded.

score 2 · Answer 3 · edited Jun 11 '20 at 10:02

You probably cut down your "cache time" when shared the disks

Imagine that you have two applications "A" and "B":

Application "A" has a small database with only 40GiB, loads 1GiB/day and most queries uses the data for the last week days. In a server with 20GiB of RAM dedicated to disk cache, probably near than 20 day worth of data will be on the disk cache and most reads will not even move a disk head.
Application "B", on the other side is a medium archive with 2000GiB, loads 20GiB of data every day and most queries read sequentially the whole thing. It is an archive and mostly do textual queries that is difficult o index and the sequential read happens within a day anyway which is enough for the application users. As many archives it is used only by auditories that does not need faster responses.
If you join the disks of these two servers on the same storage using the same 64GiB cache, application "A" and "B" move 21GiB data per day. Then the cache will hold at most 3 days of data. Before the merge, application "A" did most of their queries on RAM, now, most of them need a phisicall disk read. Before the merge, application "B" had little concurrency from the application "A" in the disk accesses, now has a lot of concurrency.

Got the idea?

Segment the disk caches is very important to performance because RAM speed is between 4k and 4 million times faster than 15k disks for random access. Disks has to move the head to get the data, RAM does not. 15k RPM disks are a waste of money. They are about 2 times the speed of regular SATA drives for random access and costs way more than 2 times the price of SATA drives.

About VMDK

My servers are too big and we had issues in the past with big VMs (700GiB RAM for example) on VMWare. We also had severe performance issues and unexplained crashes. For that reason we moved to KVM. I was not the manager of the virtualization server at the time, so I can not say what was wrong with our VMWare. But since we moved to KVM and I become the virtualization server manager we have no more issues.

I have some vm images on phisical devices (SCSI forwarding) and some images as .img image files (similar to VMDK with fixed size). People on internet said SCSI forwarding is way faster, but for my usage patterns the performance is the same. If there is a difference is small enough for me not to see. The only thing is that when creating a new virtual machine we have to instruct KVM not to cache the disk access on the host operating system. I do not know if VMWare has an similar option.

My sugestions to you

1. Change storage strategy

Trade the storages by internal disks. 24 internal SATA disks allows a big raid 10 that will be way cheaper and faster than most storages. And have a side benefit, for less cost you will have an surplus of disk space on those servers that can be used in cross backup and maintenance tasks.

But does not expose this surplus space to your users. Keep to yourself. Otherwise it will be hell to make backups.

Use storages for things they are designed for:

Centralized backup;
Database/archives that either is too big to fit in internal disks;
Database/archives that usage patterns are not accelerated by disk caches and the number of disk heads needed for performance does not fit in internal disks or dedicated storage.

And... does not even bother to Buy storages with much disk cache. Instead put the money in increasing the RAM of the servers that use the storages.

2. Move RAM from the storage cache to the actual servers if possible

Assuming you have the same ammount of cache RAM in you storages after the unification you may have enough RAM. Try to move the RAM from the storage cache to the actual servers in proportion you have before. That if the RAM chips are compatible. That may do the trick.

3. No RAID 6 to mission critical databases

Raid 5 and 6 are the worst for database performance. Move to Raid 10. Raid 10 doubles the reading speed because you have two independent copies of each sector that can be read independently.

4. Move the database log to a dedicated internal drive

I use postgres, and moving the write-ahead-log to a dedicated disk makes a lot of difference. The thing is, most modern database servers write the information in the log before writing the information in the database data area itself. The log is usually a circular buffer and the writes are all sequential. If you have a dedicated physical disk, the head will always be on place to the write, almost no seek time even if is a low rotation drive. As I read in internet, Mysql uses the very same design.

*24 internal SATA disks allows a big raid 10 that will be way cheaper and faster than most storages* Then IO operations from everything sharing that array will contend. And cache on disk controllers is critically important if you're going to stuff everyone onto a few huge arrays shared between numerous clients. *No RAID 6 to mission critical databases* RAID 5/6 are *fine* for high-performance needs - *if* you know how to build the array(s), align the filesystems(s) built on them to avoid read-modify-write, and understand the IO patterns and limit IO to (mostly) full-stripe access. — Andrew Henle, Sep 09 '16 at 10:32
RAID 5 and 6 will never compare with RAID 10 for random read. For sequential read you may be right. The biggest penalty for database servers is the seek time. With a 10 disk array in RAID 5 and 6 you have one independent read operation each time. With a 10 disk array in RAID 10 you may have 10 independent read operations simultaneosly. — Lucas, Sep 09 '16 at 12:15
2 x 600Gb 15k RPM SAS disk on dell costs US$ 1,738; 2 x 500Gb 7.2k RPM SAS disk on dell costs US$ 580. The difference is US$ 1158. On dell I can buy about 150Gb RAM with that difference. If I take the slow disks, I can fit 30% of the 500Gb in the cache RAM of my database server. Way way faster for almost any application. Even the disks being slower, the application will depend less of the disks boosting performance. — Lucas, Sep 09 '16 at 12:58
My servers are way way bigger than this, and the economics is the same for big storage arrays. For example, a big ZFS4-4 on oracle page has 192000Gb of disks (using RAID 10) and can have up to 512Gb RAM. Which means only 0.2% of cache. — Lucas, Sep 09 '16 at 13:13