4

On 29th October 2011, I built a RAID-5 array using 4 x 146.8GB Seagate SAS ST3146855SS drives running at 15k connected to a PowerEdge R515 with HP Smart Array P411 controller running Windows 2008 (so nothing particularly unusual).

I know that parity initialisation of a RAID-5 array can take some time but it's still running after 2.5 weeks which seems a little unusual.

I'd previously built another array on the same controller using 4 x 2TB SATA-2 drives and that did take a while to complete but a) I'm sure it was less than 2.5 weeks, b) that array was ~12 times bigger and c) during initialization, the percentrage slowly increased each day.

At the moment, the status display for this new 2nd array simply says "Parity Initialization Status: In Progress" and it's said that since the start. It's this lack of change on the status that worries me the most - feels like it's not actually doing anything.

Do you think something has gone wrong or am I being unpatient and for some reason, the status not increasing is normal? I kind of expected a much smaller array on faster drives (15k SAS versus 7.5k SATA-2) to build in a few days.

This is our primary SAN running StarWind so my "have a play" options are very limited. This 2nd array is currently in use for one small virtual disk so I could shut the target machine down, move the virtual disk to another drive and try rebuilding.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Rob Nicholson
  • 1,678
  • 8
  • 27
  • 53
  • So, just let me sum things up here. You put an HP controller in a Dell server and you **didn't** expect there to be problems? Yikes. – MDMarra Nov 21 '11 at 14:38
  • @MarkM - no, I bought a controller that fitted the requirements we needed from eBuyer which happened to be made by HP in the same way it could have been made by LSI Logic. However, having read many HP posts on identical problems with *HP* servers and some SATA disks (inc. HP re-branded ones) and suspect it's not specifically some incompatability with just Dell. They've broken their own servers as well with the firmware upgrade – Rob Nicholson Nov 24 '11 at 17:31
  • @ewwhite - thanks for your suggestions but as of writing, the problem isn't resolved and whilst your suggestion of upgrading the firmware is probably the right answer (and I will mark it as such next), it doesn't help for us as I don't want to risk upgrading the SAN (even though the SATA disks and not mission critical) and ending up with an unusable system. So I'm going to buy an LSI Logic card (which gives us 1GB cache and 6Gbit/s) and migrate the virtual disks across. Might then try upgrading it when not reliant on it! – Rob Nicholson Nov 24 '11 at 17:37
  • I mentioned the cache module and flash/battery backup in my original post. I didn't realize your setup didn't have it. HP really shouldn't sell any of these controllers without cache modules and batteries. – ewwhite Nov 24 '11 at 17:51
  • We have the cache, just not the battery backup. And yes, in a mission critical system battery backup should be standard – Rob Nicholson Nov 25 '11 at 13:26

3 Answers3

4

Well, it's a little odd. I don't see many cases of mixing HP Smart Array controllers and Dell servers. Either way, the parity initialization doesn't begin until I/O is started on the new logical drive. May I ask how you're monitoring this? Via the HP Array Configuration Utility webpage? Perhaps the HP ACU command-line tool? If you have the latter installed, can you provide the output of:

ctrl all show config detail

We'd like to see that output to see if there's a potential issue with one of your disks.

From the HP Smart Array manual:

Background RAID creation 
When you create a RAID 1, RAID 5, or RAID 6 logical drive, the Smart Array controller must build the 
logical drive within the array and initialize the parity before enabling certain advanced performance 
techniques. Parity initialization takes several hours to complete. The time it takes depends on the size of the 
logical drive and the load on the controller. The Smart Array controller creates the logical drive, initializing 
the parity whenever the controller is not busy. While the controller creates the logical drive, you can access 
the storage volume which has full fault tolerance. 

Also, check the firmware on the Smart Array P411 controller. Do you have a cache module installed with a battery or flash backup? If not, you'll have other performance problems over time.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • I'm using the HP Array Configuration Utility page so thanks for the heads up on the command line utility. I'll check that out and get back to you – Rob Nicholson Nov 17 '11 at 16:10
  • Re: parity whenever the controller is not busy: hmm, I could see right now the controller been busy nearly all the time as the other array is been used a lot – Rob Nicholson Nov 17 '11 at 16:12
  • 1
    I don't think you can upload files to here and there is a small comment limit so I've put the output of the ctrl command here: http://www.mailbigfile.com/5e1532240f3984dcc0b2579c8165ba7e/listFiles.php?repro_id=683. That link will be valid for a few weeks – Rob Nicholson Nov 17 '11 at 16:39
  • Upgrade the firmware! You're at v2.74. The current version is 5.12. Many bugs fixed. Download, install, reboot. Here's the link. http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=329290&prodSeriesId=3886974&prodNameId=4017898&swEnvOID=4064&swLang=8&mode=2&taskId=135&swItem=MTX-48dcba3142db44409671bab1ab - changelog: http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=329290&prodSeriesId=3886974&swItem=MTX-48dcba3142db44409671bab1ab&prodNameId=4017898&swEnvOID=4064&swLang=8&taskId=135&mode=5 – ewwhite Nov 17 '11 at 16:50
  • Thanks for the link, will schedule some downtime of the SAN to upgrade. Card was only bought a few months ago but I guess it could have been lying around in stock for a while. I moved the VM off today and have started the rebuild again. It now says "Queued" as the status but I agreed, I need to get the firmware updated ASAP – Rob Nicholson Nov 19 '11 at 17:03
  • LOL - that was a big step backwards. Upgraded the firmware of the Smart Array P411 card in our backup server and the system now hangs on boot ;-) The P411 detects the disks and then hangs... I've had to take the card out so the backup server can at least carry on with tape backups whilst one buys a new card... Thank goodness I did try this on the SAN! – Rob Nicholson Nov 19 '11 at 18:23
  • Try re-running the firmware upgrade. You shouldn't experience a total system hang as the result of the upgrade. If you do, there should be a corresponding error message at POST. Following the flash, you may want to power-cycle the external storage enclosure, as well as the server. – ewwhite Nov 19 '11 at 18:58
  • No, I agree one shouldn't experience a total system hang after a firmware upgrade but it definately happened ;-) I left it overnight and it was still sat at this place this morning http://www.picpaste.com/P411-1-AsDBsFco.jpg. I can't upgrade it again on this system because it won't boot... I've found a HP bootable CD which might help so will try that first. Failing that, I'll try the card in another PC to see if I can I least get it to boot so can reflash. It said "0 logical drives" because I unplugged the SFF8088 leads. It does say "2 logical drives" when plugged in – Rob Nicholson Nov 20 '11 at 13:23
  • Ahh flaw in my plan - the R200 boots off CD *after* initialising storage controllers, which of course hangs. Will try it in another PC – Rob Nicholson Nov 20 '11 at 13:34
  • Got P411 with v5.12 working in a PC which allowed reflash of the the firmware but to no avail. Still won't boot in the R200 so sounds like there is some incompatability. Put v2.74 back on (only other version for download) and it's booting again so got working system whilst mull over problem. Whether the same problem would occur in the original system mentioned (R515) will have to wait as that's a production StarWind SAN. The R200 runs Backup Exec which is less critical – Rob Nicholson Nov 20 '11 at 15:04
  • I'm keen to get v5.12 working as I came across http://tinyurl.com/ckv7qez which describes another problem we've occasionally seen with virtual disks mounted on the SATA enclosure. Different drive types but sounds like a similar problem. HP don't make their own drives do they? – Rob Nicholson Nov 20 '11 at 15:04
  • HP uses a number of suppliers for disks. Have you also updated the BIOS on your Dell servers? – ewwhite Nov 20 '11 at 15:07
  • Yes, BIOS was upgraded yesterday. My preferred option today is to look at a completely different storage controller as this P411 isn't filling me with confidence! We don't really have any kind of support contract with HP so my chances in getting a R200 compatible firmware seem slim – Rob Nicholson Nov 21 '11 at 14:14
  • The Smart Array controllers are fine. But it may make sense for you to try something like an `LSI 9205-8e SAS HBA` or an `LSI 9285-8e RAID controller`. Good luck. – ewwhite Nov 21 '11 at 14:23
  • Thanks for the recommendation and our supplier also suggested an LSI controller so the purchase order is going in. I found some older versions of the firmware yesterday and tried v5.06 with the same result. Interestingly, the release notes for the firmware specifically mention "Fixing boot problems when no battery backup module installed" which is exactly what we've got. Sounds to me like they've still got boot problems when no battery backup module installed but only with certain motherboards/CPU. Even with HP servers themselves, there are some "Don't upgrade on this system" warnings... – Rob Nicholson Nov 23 '11 at 11:38
  • To draw a line under this one: the P411 has had many fixes since v2.74 was released (version shipped to us). Unfortunately, the latest version (5.12) appears to be uncompatible with the PowerEdge R200 and cause it to hang on boot so upgrading the BIOS has to carry a health warning. It is sometimes possible to downgrade the firmware in another system. The release notes from HP cover two problems we've encountered: problems with some SATA drives and hanging on booting. Whilst HP may have fixed boot problems on their own kit, other motherboards may still have trouble – Rob Nicholson Nov 24 '11 at 17:42
  • The bottom line therefore is that whilst you may have no problems with the P411, you may encounter problems in non-HP kit (and in fact, in some HP kit as well) so pick your combinations carefully. The recommendation to look at non-HP kit may be the wisest choice. If using the P411 in any server, upgrading the firmware is highly recommended! – Rob Nicholson Nov 24 '11 at 17:44
2

The likely-hood of a non-recoverable error in this day and age is extremely high. Might I suggest either a raid 1 or 10. Especially if this is holding anything important.

Gregg
  • 21
  • 1
  • True. I would recommend against using RAID 5 for deployments these days for performance and reliability reasons. – ewwhite Nov 16 '11 at 16:57
  • I agree for performance reasons but reliability depends upon how reliable. But for cost, RAID-5 is a good compromise. This array isn't mission critical - the internal disks in the R515 are RAID-10 and they hold the mission critical stuff – Rob Nicholson Nov 17 '11 at 16:14
  • Actually RAID-5 is OK for small drives such as these. It's unreasonable for SATA drives, but for SAS or SSD drives it's OK. – wazoox Nov 17 '11 at 17:08
  • Why unreasonable for SATA and okay for SAS? Parity rebuild time? The reason I'm asking is that we've got a large SATA (4 x 2TB) RAID-5 array connected to our Backup Exec server and speed is pretty pants but as this is "just" backup, the fact it takes two days for a full back isn't the end of the world (we also replicate to our USA site). Thought about putting 4 x 3TB drives in there and using RAID-10 to get a bit more speed – Rob Nicholson Nov 19 '11 at 17:03
  • The size of SATA disks (2TB/3TB) versus these SAS drives (146GB) means you have MUCH more work to do, increasing your window for a double-drive failure. On top of that you're stressing your SATA drives (designed for 20% duty cycle) with near 100% duty cycle for a long time (days, possibly!) – MikeyB Nov 19 '11 at 18:59
  • It's the disk size and rebuild time. And there are other issues that center around RAID 5 arrays where a disk has completely failed and another is FAILING. That's a bad situation to be in. – ewwhite Nov 19 '11 at 19:01
0

A disk firmware is available for DG072BABCE, and DG146BABCF drives : "This firmware prevents a rare condition that may occur during a WRITE SAME command sequence that may result in incorrect data being written to the hard drive. The WRITE SAME command may be used during RAID ARRAY parity initialization"

  • Do you have the URL where this information comes from. If so including it your answer will help. – ChrisF Feb 03 '13 at 13:09
  • Can you reference where this firmware is available/downloadable? Also please reference the release notes if possible. Will make this answer much stronger. – slm Feb 03 '13 at 13:09
  • This is moot. The OP is not using HP disks, so the firmware advisory does not apply. – ewwhite Feb 03 '13 at 15:08