6

We had an extended power outage (almost four hours); now I have a Dell PowerEdge 2850 server which is giving me this error (from a PERC 4e/DI) on boot up:

Dell PERC 4e/DI boot messages

All drives are listed in the PERC configuration menu:

Dell PERC 4e/DI Configuration Menu

The failed drive does not show any indications of failure on the LEDs; the bottom appears to be green and the top is unlit. None of the disk drive LEDs are flashing.

All drives are in one RAID5 array with 6 stripes of 64K in size.

I do have one or two spare Dell PE2850s available for test. However, from what I've read, the Unresolved configuration mismatch... error would show up then - but perhaps I could activate the good drives anyway.

What if I remove the bad drive and try to boot that way? I may try that - but both the PERC 4e/DI and the Adaptec 2410SA card (activated later in the boot process) list all ports as not functioning.

Here are the specific questions:

  1. Is it possible to get this (degraded) array running again on this system? How?
  2. Will it help to configure a new configuration and save it (without initializing)?
  3. Is it possible to move a degraded array to a new system and power up?
  4. What if the "bad" disk was removed or replaced? How does that affect the system boot? How does that affect a disk array move?

EDIT: I found this question which appears to detail how to move drives from one host to another; is there anything more that should be added to the process detailed there? In my case, the move would be different in two ways: one, I have an apparently degraded array (RAID5), and two, the array is RAID5 not RAID1. The first is the biggest question mark; RAID5 should import just like RAID1 I'd say.

I found this question which talks about "repairing" a failed mirror, but there is no clear answer on how to fix it, and I'm using a RAID5 anyway - a RAID5 which hasn't been moved or rearranged.

UPDATE: The replacement system has a PERC 4/DC in it - compared to the old system which has a PERC 4e/Di in it. I hope this will recognize the old (degraded) array and import it just fine. If this works well, I'll even be able to use the old drives (no failures there) as a replacement for the failed drive.

Mei
  • 4,560
  • 8
  • 44
  • 53
  • 1
    You know things are bad when your taking pictures with your cell phone. – Wesley Feb 20 '12 at 03:46
  • I do everything with my cell :P - and it has better resolution than my first Kodak digital camera. – Mei Feb 20 '12 at 14:33

2 Answers2

2

Sounds like it's gone slightly nuts and is getting one config from the good drives and another slightly out of sync or invalid config from the "faulty" drive.

First thing I would do is just remove the faulty drive and try booting up then. If that doesn't work, then try with one of your other 2850's.

Robin Gill
  • 2,503
  • 13
  • 13
  • I suspect at this point that either the backplane or the PERC controller has failed; the configuration it is reporting is probably from NVRAM: thus it shows its recorded configuration, but has no disk configuration to report (all drives are missing). I've already tried removing the faulty drive; no difference. Thinking seriously about moving to secondary system, but what about this degraded array? May go ahead and try anyway. – Mei Feb 19 '12 at 21:03
1

The problem was - as it stated in the screenshot - that no drives were detected. This was, obviously, quite a surprise as the drives were in the machine.

My hypothesis is this: when the configuration menu for the PERC was entered, it saw no disks - so rather than ask to choose between Disk Configuration and NVRAM Configuration, it showed the only configuration it knew. This presented the appearance of having checked the disks on the system when in fact, no such process had taken place - the disks remained unknown to the controller.

I also hypothesize that since the system could not access the drives electronically, it also could not detect the (known) bad drive in the array. Thus, the LED remained at a "good" state instead of bad.

I moved the disks to a new system this way:

  • Shut off old system
  • Shut off replacement system
  • Marked all drives in the old system with their chassis slot numbers - and marked bad drive with a red tag instead of white.
  • Marked all drives in the replacement system with their chassis slot numbers (just in case)
  • Removed all drives from replacement system
  • Rebooted replacement system and cleared configuration from PERC configuration menu
  • Shut off replacement system
  • Removed all drives from old system
  • Placed all drives into replacement system (into matching locations)
  • Rebooted replacement system
  • Disabled alarm in PERC configuration menu
  • Rebooted replacement system

(I also had to change the network connections, but that is not relevant to the discussion here.)

There were no problems at all going from a PERC 4e/Di to a PERC 4/DC: all of the descriptions in the manuals suggested that the only move that would not work is moving to a PERC 2 from something more recent.

After this, the system (VMware ESXi in this case) came up. There's more to do but it's all about virtual machines and VMware ESXi. The box is good.

If all remains stable, then I'll replace the bad drive with one of the ones from the retired system.

Edited for completeness

On another identical PowerEdge 2850 (also with a PERC 4e/Di), the same message came up:

1 Logical Drive(s) found on the host adapter.
0 Physical Drive(s) found on the host adapter.

After the message, no errors and the machine started normally. Thus, this message is not indicative of a failure; perhaps it only counts physical drives not in a logical drive.

Mei
  • 4,560
  • 8
  • 44
  • 53