1

Here's a weird one I've been fighting for a while. I've got a old out-of-warranty Dell PowerEdge 6650 server with a PERC 3/DC RAID controller controlling four newer (maybe a year old) Fujitsu 136GB U320 SCSI disks in a RAID5 array.

Maybe once a month or so one of these disks will randomly "fail." By fail, that means the PERC decides that they've failed and it starts beeping and blasting alerts. All I have to do to resolve the issue is remove and reseat the "failed" disk and it starts resyncing the array. Once the resync is complete, the bezel light on the front of the machine goes back to blue from orange and the beeping stop.

My main question is what is causing these disks to "fail," when in fact they're perfectly fine. At first I thought it might be a firmware issue, so I reflashed every flashable component in the system. BIOS, PERC firmware, disk firmware, everything.

There doesn't seem to be a cause or event that triggers one of the non-failures, it just happens at random.

It's not exactly a huge issue, but it's definitely something I'd like to resolve. Dell won't provide support since the machine is out of warranty, and their website/forums are useless as always.

brian
  • 130
  • 1
  • 2
  • 11
  • Note: this is just a dev machine used for testing so it's not horribly important. I'll probably just end up running it into the ground. – brian Nov 19 '09 at 17:51

4 Answers4

3

I like running old hardware as long as possible, but I'd get the machine replaced. You're going to have a tough time making any headway in resolving this issue.

My suspicion would be subtle interaction between the firmware on the "failing" drives, possibly the hot-swap backplane, and the RAID controller. No one at either Dell or Fujitsu is testing those drives what that controller anymore, and you're unlikely to get anyone at either company interested.

You're puting the array at risk each time this happens, since the array is becoming degraded and being rebuilt. If a legitimate failure happens on another disk during the rebuild process you're going to be in an array failure scenario. Hopefully you've got good backups.

It's frustrating because adding disks really should work fine, but with something this age you're really better off biting the bullet and getting something with active manufacturer support.

Evan Anderson
  • 141,071
  • 19
  • 191
  • 328
1

First thing I would have said would be to update the firmware, as this happens fairly often with PE servers with PERC controllers.

Just because the array is able to rebuild when you re-seat the disk, I don't think that means that the drive is okay, it could be on its way out and that is why it keeps dropping out of the array. That is why when Dell tells me just to reseat it I try to get them to send me a new one (even though they are probably just sending me one that someone sent back :-/ ).

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
  • Is it the same disk each time? If so then I suspect Kyle may be right that the drive itself is failing, and the fact that the Array can rebuild it each time is giving you a false sense of security. – BillN Nov 19 '09 at 17:26
  • The thing is it's never the same disk that "fails," every time it's a different one. – brian Nov 19 '09 at 17:49
1

I had the same issues with a Power Edge 2650, in fact, it was a PERC's problem, if you have some spare, try to swap it.

Dr I
  • 943
  • 16
  • 33
  • you can pick up a "like new" from focus technologies...http://www.focustechnology.com/dell.asp...I just turned off my last 6650 along with my last 8450 about 6 months ago..I had replaced the PERC card in both about a year before that. – Thomas Denton Nov 19 '09 at 17:38
0

You said you already flashed the firmware of the raid card. Did you update the driver for it at the same time? In previously support calls with Dell about failed drives, they've always been annoyingly adamant that we were using both the latest firmware and driver for the raid card.

One of them even suggested that I needed to re-build the array from scratch after updating the firmware to make the drive stop failing. Fortunately, I got them to replace the drive before I resorted to doing that (which was the problem). So I can't confirm or deny whether his suggestion would've worked.

I had one last thought and only because you didn't mention it explicitly. Have you checked for a firmware update for the actual drives?

Ryan Bolger
  • 16,472
  • 3
  • 40
  • 59
  • I flashed the BIOS, PERC firmware, and disk firmware all at the same time. The last time I completely smashed everything and started from zero was when the disks were brand new. The PERC got a full reset, BIOS went to defaults, etc. – brian Nov 19 '09 at 17:50