2

On Friday I had an absolute disaster. I went to the datacenter in order to rack a new server and at the same time, took a new disk with me to replace a drive that had failed a couple of weeks back in a HP P2000 G3 10Gbit iSCSI array.

The P2000 is loaded with 12 x 2TB 7.2k MDL SAS disks and configured as 2 RAID10 arrays, each with 4 drives + 2 hot spares. I had already removed the failed disk a week earlier, its place in the array had been taken automatically by one of the hot spares as expected.

So, the task in hand was a simple drive replacement to give me back a hot spare. Simple right? Slide in the disk, make sure its visible in the GUI and configure as a hot spare....

No...

I opened the brand new disk from its packaging and slid it into place, instantly all the drives in the array went orange. I checked the GUI and every single disk was showing 'Invalid metadata', a quick check of running services using the two arrays confirmed that everything had lost visibility of the LUN's.

I rescanned the channels, rebooted the controllers all to no effect. Drives started disappearing from the GUI and my VD's (LUN's) were now missing from the GUI as well. I removed the new disk as well, still nothing.

In an act of desperation and confusion I pulled the power to the P2000 and let it fully reboot. It came back online and I could see my VD's again... However both arrays had lost all redundancy, like each half of the 2 RAID10's had lost their mirror disks.

All the other disks that were once part of the array were now showing as available. I was able to configure them as hot spares and the 2 RAID10's began reconstruction. It is now running again, all be it without my new disk since I am too scared to put it in again.

Does anyone have a clue about what happened here?

The only thing I can think of is that the new disk must have contained metadata of its own and confused the P2000. However it was a new sealed disk from our usual supplier. However even if this was the case, I wouldn't expect the array to do anything with that disk that effects the existing RAID configuration!

Help please!

HBruijn
  • 72,524
  • 21
  • 127
  • 192
tomstephens89
  • 981
  • 1
  • 11
  • 23

1 Answers1

1

There's always a chance that the disk was a recycled drive. Are you sure that the error was "invalid metadata", or was it something like: "Stale Metadata".

If the drive is still reporting that, select the P2000 in the SMU, and navigate to: Tools > Clear Disk Metadata for the specific drive.

Let me know if the error message was something different.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Thanks for the reply. It was indeed invalid metadata, but like i said, appeared on all my working disks, not just this new one. I would clear the metadata on that disk however its not in the unit anymore and I am not going to attempt inserting it again until I have migrated the services that the P2000 supports to alternative storage. Then I can spend some time working out what went wrong. But surely even if it was a refurbed disk, it shouldn't destroy my existing RAID arrays? – tomstephens89 Oct 12 '15 at 13:45
  • I'd call HP and get an answer there. The point of having supported storage is to get solutions for issues like this. – ewwhite Oct 12 '15 at 15:36
  • Looks like I'm going to have to do that. – tomstephens89 Oct 12 '15 at 15:38
  • But since i don't have a support contract anymore I cannot. However after trawling the logs I cannot find anything useful. However it shows clearly that after the SAS topology changed when I inserted that disk into slot 7, the existing RAID broke since the log shows write back cache errors unable to commit to my VD's. – tomstephens89 Oct 13 '15 at 10:42
  • Did you use a dual-ported SAS disk? – ewwhite Oct 13 '15 at 11:42
  • Well, same part number as all the other disks, same caddy and same connector interposer on the back. So id say yeah. – tomstephens89 Oct 13 '15 at 13:56
  • There shouldn't be an interposer involved for a nearline SAS disk. – ewwhite Oct 13 '15 at 13:58
  • Its a midline SAS disk and it might not even be an interposer, just a connector. All the disks in the array have it. It takes the normal sas connector on the drive and turns it into a different type. Part No. 60-272-02 – tomstephens89 Oct 13 '15 at 14:05
  • Request a new drive from your vendor. If the SAS topology changed, this may not have been a dual-ported drive. – ewwhite Oct 13 '15 at 14:06
  • I will check the disk, but if that was the case, would that cause the witnessed behavior? – tomstephens89 Oct 13 '15 at 14:17