4

on a HP PROLIANT ML350 G5 server, one of 3 physical hot swappable drives in a Smart Array has a status of Predictive Failure and the amber light is flashing. Each drive is 146GB. In HP System management, I noticed that the drive that is failing (in bay 2) has some Read and Write Recovery errors listed in the Statistics section. The drive in bay 1 (which has a status of 'No Action Required' also has Read and Write Recovery errors listed in the Statistics section. I know that the rebuild process uses the other drives to aid in the process of rebuilding the replacement of the one that failed. I'm concerned that the errors on the drive in bay 1 are a sign that there may be a problem rebuilding the drive in bay 2 when I plug in the replacement drive. Any thoughts or recommendations?

Thanks! Penny

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Penny Downey
  • 41
  • 1
  • 1
  • 2
  • RAID is not a replacement for backups. Related: Unless you have tested restoring from your backups, you don't have backups. – Chris S Jan 02 '12 at 15:24
  • I'm still waiting for someone to make a RAID controller with an explicit "replace this disk" command rather than the default of simulating a failure and rebuilding. – Simon Richter Jan 02 '12 at 16:24

2 Answers2

5

And this is why RAID5 setups can be a problem... Typically, the factors that cause one drive to fail can cause its neighbors to fail. The amber light means that this is a drive pre-failure. This is enough to obtain a new disk under warranty (if you're covered) or a sign to acquire a replacement. Try to replace the disk and hope that the rebuild works. I'd plan to replace the second drive displaying errors as well.

Your worst-case scenario would be a "Waiting for rebuild" error from your Smart Array controller. That would indicate that the replacement drive cannot rebuild because of drive errors on one of the other disks. In this situation, you will need to backup and restore to an array comprised of new disks.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Summary: Biggest problem with RAID 5: get error condition, replace drive, discover another disk in array has unrecoverable and undetected error that prevents rebuild. Lose entire volume. RAID 5 sucks. – Bart Silverstrim Jan 02 '12 at 16:43
4

My recommendation would be to take a backup right now and replace the failed/flashing disk as soon as possible and worry about everything else later.

Chopper3
  • 100,240
  • 9
  • 106
  • 238