Very slow harddisk performance

Question

Today I was trying to execute a long-running (but highly optimized) query on a server in our company. I've run the same query on the same data on my home computer (5400rpm harddisk) and it took about 40min. The server is about 10 times slower, maybe still running! The server does about 30 inserts/second, my home computer 300 inserts/second.

As the CPU load is very low (3%), memory is plenty and network is not used as the query is executed locally I suspect the disk I/O to be the bottleneck. At my home computer instead (which got better CPU power) two cores were nearly fully loaded.

The relevant disk configuration is as follows:

Windows 2003
SEAGATE ST3500631NS (7200 rpm, 500 GB)
LSI MegaRAID based RAID 5
4 disks, 1 hot spare
Write Through
No read-ahead
Direct Cache Mode
Harddisk-Cache-Mode: off

Surely, RAID5 may not be the fastest, but the disks compensate by being faster as at home.

Some further interesting data:

Avg. Queue Length: 30
Avg. Queue Length (Read) : 2
Avg. Queue Length (Write): 26
Bytes/s read: 1.3MB/s
Bytes/s write: 1.2MB/s
Sec/Read: 0.007
Sec/Write: 0.500
Writes per Second: 36

How can I resolve this write bottleneck? Is it because of the many small writes? Is there some malfunctioning driver or a possibility to cumulate some transactions for bigger writes?

The seconds per write are very large!

Thanks!

is the server under load? have you checked that the execution plan on your pc is the same as the server? — Nick Kavadias, Mar 02 '10 at 11:55

score 1 · Answer 1 · answered Mar 02 '10 at 02:16

The write-through cache could be a performance hit. Every write to the cache must be written to the disk straight away. A write-back cache can wait for the optimal time to write the data to disk. Depending on the hardware, this may require an battery-backup (on the array controller), and some controllers disable write caching if the battery is absent or needs replacing. You may want to verify if the array cache is in fact enabled. Some controllers don't make this very obvious.

score 0 · Accepted Answer · answered Mar 01 '10 at 21:38

I would suspect the RAID5 to be the main culprit. Every block write to your array could result in a reads followed by two writes - at best (if the extra block is already read into cache) it will be two writes. On top of that, if your transaction logs and data files are on the same array each write the database performs will result in two writes (one to the log and once to the data file) - while this will be the case on your home system too so it won't make much odds in your comparison on its own, it will compound any performance difference that there is otherwise.

Your home drive probably has its cache turned on, which will help by allowing it to reorder writes slightly to improve performance by reducing head movement. Turning on read-ahead may help you by reducing head contention with the write operations but that depends on the exact I/O pattern your query imposes.

RAID10 is usually recommended for databases rather than 5 for write performance reasons.

When you say "the same data" - is one system running a backed up and restored database from the other? If there is a difference in the history of the two databases it could be that the slower one needs to have its index stats updated (unexpected query speed problems can be a result of bas stats causing the query planner to chose a less optimal path than it otherwise would).

Also, do you have the same number of CPUs/cores on both environments? Some time ago I saw SQL Server 2000 take longer to run a disk-bottlenecked query slower when using two CPUs than when it was told just to use the one, I presume because it tried to split the load over two CPUs but this resulted in increasing I/O contention due to two threads reading together from different places on the disk (I've not seen this more recently, so it may have been a bug in that edition+SP that is long since fixed, but it might be worth looking in to - if you are giving explicit index hints that help in a low-core environment try removing them in case the query planner could make choices better suited to the extra cores, or try manually reoptimising the hints yourself).

Thanks for your answer. I've read that "Testing at Microsoft has found that RAID 5 can be as much as 50% slower than using RAID 10.". This does not sound that bad, assuming that RAID10 is not 5 times slower than no RAID. As it's a fresh import, the single affected table is practically empty at start. Indices are disabled, the table is stored as a heap, no constraints to be checked, the data retrieval does not seem to be the problem. Is there anything else I could do? Are bulk inserts affected that much too? — Wikser, Mar 01 '10 at 22:33
I don't think you will see a performace hit with RAID 10. This is becuase there is no meed to calulate and store the parity information. RAID 1 is a mirror and RAID 0 is a stripe. Very little in the way of overhead for either of these two events. — Richard West, Mar 01 '10 at 23:22

score 0 · Answer 3 · answered Mar 01 '10 at 23:11

I had serious IO issue on our BI datamart. I found that someone resitted one of HDD configured in RAID 5 and rebuilding parity-according to HP support; I still don't know what started rebuilding.

It caused so much trouble, but once rebuilding is finished, I have the performance back. This may not be most insightful answer but it is one of the consideration you should make.

Very slow harddisk performance

3 Answers3

Linked