Large disk queue length

Question

A few days ago disk on my server started to have large queue length:

enter image description here

I've changed controller battary recently, HP configuration utility was saying the battery is bad, but after changing the battery nothing changed. HP configuration utility is now saying everything is fine but the queue is still the same

What can I do to eliminate the problem? Maybe I should change the controller?

UPDATE 1 (gtapscott's questions):
1) This is a read queue, I added a separate read queue counter and it matched with overall queue length. Write queue is empty.
2) Avg. disk queue varies from 0 to several hundreds, average value is about 100-200. I'm not sure but I feel this counter acts like there is no controller cache at all.
3) There is 10 disks in RAID-5

UPDATE 2 (ewwhite's post):
Yes, I rebooted the server after the battary change

I have slightly difference interface, here it is:
enter image description here

So cache is enabled on the RAID massive

UPDATE 3:
The problem was in one of the RAID disks as ewwhite suggested

score 4 · Answer 1 · answered Dec 22 '11 at 20:59

4

Check to ensure write caching has been re-enabled. It may well have been automatically disabled when the controller detected a battery problem but may not have been switched back on after the battery was changed.

answered Dec 22 '11 at 20:59

John Gardeniers

27,262
12
53
108

Indeed, write caching was disabled, I've re-enabled it. But since this is a read queue this didn't help – Vladimir Dec 22 '11 at 22:32

score 2 · Answer 2 · answered Dec 22 '11 at 20:38

A few observations:

Determine if this is a read or write queue, break out the perfmon counters into read queue and write queue. If it's write queue it could definitely be related to the controller, as it's write caching will be disabled if there is a battery issue.
I notice that the average disk queue length counter is included as well. What is it's max/min and average? The current disk queues counters tend to be very spikey and aren't as good of a metric.
How many physical disks comprise this array? The classic metric is that the average disk queue should remain at 1-2 per physical disk.

I've updated the post: 1) This is a read queue, I added a separate read queue counter and it matched with overall queue length. Write queue is empty. 2) Avg. disk queue varies from 0 to several hundreds, average value is about 100-200. I'm not sure but I feel this counter acts like there is no controller cache at all. 3) There is 10 disks in RAID-5 — Vladimir, Dec 22 '11 at 22:33

score 2 · Accepted Answer · answered Dec 22 '11 at 21:50

2

You mentioned that you had the HP Array Configuration utility installed. Hopefully, you have the remainder of the HP System Management agents installed. Did you reboot following the array battery change? If not, that may help. You can also look at the Array Configuration Utility to check the array status. You should see something like the following, showing Accelerator: Enabled:

enter image description here

or...

enter image description here

answered Dec 22 '11 at 21:50

ewwhite

194,921
91
434
799

Yes, seems like the accelerator is enabled (added pictures to the post) – Vladimir Dec 22 '11 at 22:35
I've just noticed this line: `Parity Initialization Status: Initialization Failed `, what does this mean? – Vladimir Dec 22 '11 at 22:56
I see the problem here. Your controller is only set for read caching. This is the "Array Accelerator (Cache) Ratio" setting. Right now, according to your screenshot, you have things set to 100% read. A better setting for your controller would be 25% read/75% write. Can you try setting that and reporting back? – ewwhite Dec 22 '11 at 23:01
This was initially set to 25% read/75% write. After I noticed that disk queue is read disk queue I changed it to 100% read. I set it back to 25/75. Do you know what `Initialization Failed` in `Parity Initialization Status` mean? (this screen: http://i.imgur.com/SdIsO.jpg). Maybe I should rebuild the parity? – Vladimir Dec 22 '11 at 23:16
I'd recommend powering off (spin the drives down completely), removing power cables, reinserting and starting the system up. Pay attention to the POST messages and the BIOS messages. A 16.4TB RAID5 array is going to take a considerable amount of time to complete its parity initialization, especially on SATA disks. RAID 1+0 would not have this issue. Please see the discussion at: http://serverfault.com/questions/331581/slow-parity-initialization-of-raid-5-array-on-hp-smart-array-p411-controller/331588#331588 – ewwhite Dec 22 '11 at 23:30
Thank you for your help. Is there another way to force parity initialization? The server is at hosting center so I can't access it physically. I can though ask hosting admins to do it but if there is another way I would preffer to use it – Vladimir Dec 22 '11 at 23:46
1

You can probably get by with a reboot. There's also slight chance that parity initialization failed because of disk errors. Keep that in mind. – ewwhite Dec 22 '11 at 23:48

Large disk queue length

3 Answers3