2

I am experiencing a strange condition with our HP G1 lefthand SAN cluster. The cluster consists of 4 nodes across two sites, two nodes ( node 1 & 2 )in each site are RAID5 across the pair, and these are mirrored into our DR site (Node 3 & 4 ).

Node 3 in the DR site is reporting degraded in the CMC, however when inspecting the disks in the CMC all disks are reporting "Health Normal" with a status of "active"

/dev/cciss/c0d1 in the RAID setup is reporting degraded, and the disks on this controller are reporting "safe to remove" as "no"

Does anyone ave any insight into what might be going on, the device is out of warranty period.

3 Answers3

1

From My experience you should login to the System Insight Management Page

https://xxx.xxx.xxx:2381 x= ip address of the nodes user "sanmon" password "sanmon"

Check the disks status and the raid controllers status.

My guess you will find there your problem.

John
  • 391
  • 1
  • 4
0

This sounds like a firmware issue with either a drive, backplane, or RAID controller. You have a bad drive but something is preventing the correct reporting of the failure.

I recommend that you check your firmware from top to bottom and upgrade as necessary. HP has a document that lists the supported and recommended firmware levels for their hardware. The document is buried on their site so calling in and asking for the latest version is the best way to get it. Be very careful about the order you apply your firmware updates. There are a couple of specific upgrade steps that if not followed will result in a bricked motherboard or controller.

If you're feeling adventurous, just reboot the node in question. The bad drive will probably show up during the RAID init.

longneck
  • 22,793
  • 4
  • 50
  • 84
0

If the device /dev/cciss/c0d1 is degraded that's mean you have some hardware issue on the disks.

The support need to check the ADU report and found out which disks reporting Read errors.

If you have multiple disks with error and need to be replaced, support can make this storage node into repair mode, replace faulty disks, reconfigure the RAID and finally restripe node from surviving cluster nodes (believe that you don't have any NRAID0 volumes)

Nixphoe
  • 4,524
  • 7
  • 32
  • 51