5

I've got a five-drive RAID-5 set (with a sixth hot spare) in an Xserve RAID running the 1.5/1.50f firmware. One of the drives in the RAID-5 set has an amber/orange status light on and has been getting occasional errors like to following:

Timestamp:  11/10/10 10:34:53 AM
Priority:   Warning
Controller: Upper Controller
Type:   112
Event ID:   1000
Event:  Disk 5 Reported An Error. COMMAND:0x35 ERROR:0x10 STATUS:0x51 LBA:0x19B80
Description:    The drive reported an ATA error. This is a failure in the communication from the RAID Controller to the drive.

I have double checked the drives in RAID Admin and, as the drive is only in a warning state, the hot spare has not been pulled into the RAID set yet. As this is an old drive, I'd like to replace that particular drive first. I have a current, full backup of the data, but want to make sure I understand the process correctly.

I understand the "Installing or Replacing an Apple Drive Module" section of http://manuals.info.apple.com/en/XserveRAID_UserGuide.PDF, but it and RAID Admin's built-in help don't describe what will happen when replacing a drive in a RAID set that has a hot spare. When I pull out the drive and replace it, will it correctly use the newly inserted drive or will it use the hot spare? If it uses the hot spare, will the hot spare revert back to a hot spare once the new drive is inserted or will it permanently become a member of the RAID set and need to be moved to the original drive's slot? Or, should I just pull out the hot spare, pull out the failing drive, and pop the hot spare into the failing drive's slot?

morgant
  • 1,460
  • 6
  • 23
  • 33
  • 3
    I have never used XServe but every RAID controller I've used in the past always failed to the spare and the newly inserted drive became the new spare. Trying to move drives around after the volume starts to recover will mean either re-recovering or actually failing the volume. Again; I don't know XServe but with the server solutions I've been using for years now I'd just pull the failing drive and replace it and do nothing else. I've never moved drives between slots like you are asking about. – Nathan V Nov 27 '12 at 10:06
  • It doesn't look like you're going to get a specific answer, but seconded as per comment above. As soon as you pull a drive, it'll start to rebuild onto the spare. wait, and put your new drive in and you may be able to rebuild again, and once again make your spare drive, a spare again. – Snellgrove Apr 16 '13 at 10:30

1 Answers1

1

According to the manual at http://manuals.info.apple.com/en_US/RAIDAdmin1.2_121406.pdf, any drives not part of a disk group or array will be treated as global hot spares (as per section "Creating RAID Array"), and will automatically rebuild upon loss or failure of a drive.

It seems like your drive isn't in a failing state, but as others have mentioned, if you pull the drive, it should force the XServe to start rebuilding the parity on the spare drive. However, during this time of the rebuild you can't pull any of the other drives or you'll lose the data. I'm not familiar with the RAID tools involved, but it should give you some kind of monitoring interface to see how far along it is.

In my Dell MD3000i system, when the drive fails or is pulled, the hot spare kicks in immediately, and when a replacement drive is inserted, after the rebuild it starts what is known as a "copy-back" and replicates the hot spare back onto the replacement, at which point the spare goes back to being a spare again. Based on what I've read in the manual, though, it looks like the XServe makes the spare drive a part of the array, so a best guess would be that your replacment drive will end up being the hot spare again, since it's not part of the array:

"The RAID controller that controls the affected array will automatically attempt to reconstruct the data in order to return the system to a protected state. For example, if a hot spare drive is available when a drive fails in an array, the controller takes the available drive and integrates it into the array. The controller then rebuilds the RAID array using the new drive."

adam820
  • 111
  • 1