1

I have just configured a new server using a Smart HBA H240 card and installed hpssaducli and it detects the controller and allows me to generate a report.

The issue that I am having is how can I detect the RAID failing and sending an alert.

The report generated via hpssaducli contains a massive amount of information that is hard to sieve through and currently not got a failed array so not sure what information I would need to find in the event of a failed drive.

Details

root@server [~]# lsmod | grep hp
hpwdt                  14242  0
hpilo                  17381  0
shpchp                 37032  0
hpsa                   94958  3

root@server [~]# rpm -qa | grep hpsa
kmod-hpsa-3.4.12-110.rhel7u1.x86_64

root@server [~]# uname -a
Linux server.hostname 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

root@server [~]# hpssaducli
HP Smart Storage Diagnostics 2.10.14.0
Usage: hpssaducli [ -adu | -ssd | -val ] [ command-specific options ]
...
...

Diagnosable devices:
Smart HBA H240 in Slot 2

Output from hpssacli

root@server [~]# hpssacli ctrl all show config detail

Smart HBA H240 in Slot 2 (RAID Mode)
   Bus Interface: PCI
   Slot: 2
   Serial Number: XXXXXXXXX
   Cache Serial Number: XXXXXXXXX
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 1.34
   Rebuild Priority: High
   Surface Scan Delay: 3 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 15 secs
   Cache Board Present: False
   Drive Write Cache: Disabled
   Controller Memory Size: 256 MB
   SATA NCQ Supported: True
   Spare Activation Mode: Activate on physical drive failure (default)
   Controller Temperature (C): 72
   Cache Module Temperature (C): 36
   Number of Ports: 2 Internal only
   Encryption: Disabled
   Express Local Encryption: False
   Driver Name: hpsa
   Driver Version: 3.4.12
   Driver Supports HP SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:0A:00.0
   Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
   Controller Mode: RAID Mode
   Controller Mode Reboot: Not Required
   Latency Scheduler Setting: Disabled
   Current Power Mode: MaxPerformance
   Host Serial Number: CZ250305FS
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None


   Port Name: 2I
         Port ID: 0
         Port Connection Number: 0
         SAS Address: 500143803366B9C0
         Port Location: Internal
         Managed Cable Connected: False

   Port Name: 1I
         Port ID: 1
         Port Connection Number: 1
         SAS Address: 500143803366B9C4
         Port Location: Internal
         Managed Cable Connected: False

   Internal Drive Cage at Port 1I, Box 1, OK
      Power Supply Status: Not Redundant
      Drive Bays: 4
      Port: 1I
      Box: 1
      Location: Internal

   Physical Drives
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
      None attached


   Internal Drive Cage at Port 2I, Box 0, OK
      Power Supply Status: Not Redundant
      Drive Bays: 4
      Port: 2I
      Box: 0
      Location: Internal

   Physical Drives
      None attached
      None attached

   Array: A
      Interface Type: Solid State SATA
      Unused Space: 0  MB (0.0%)
      Used Space: 1.8 TB (100.0%)
      Status: OK
      Array Type: Data
      HP SSD Smart Path: enable



      Logical Drive: 1
         Size: 931.5 GB
         Fault Tolerance: 1+0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 512 KB
         Status: Ready for Rebuild
         Caching:  Disabled
         Unique Identifier: XXXXXXXXX
         Disk Name: /dev/sda
         Mount Points: /boot/efi 200 MB Partition Number 2, /boot 500 MB Partition Number 3
         OS Status: LOCKED
         Logical Drive Label: 026ACA51PDNNK0ARH7Q0B9471B
         Mirror Group 1:
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
         Mirror Group 2:
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
         Drive Type: Data
         LD Acceleration Method: HP SSD Smart Path

      physicaldrive 1I:1:1
         Port: 1I
         Box: 1
         Bay: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 27
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
         Sanitize Erase Supported: False

      physicaldrive 1I:1:2
         Port: 1I
         Box: 1
         Bay: 2
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 27
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

      physicaldrive 1I:1:3
         Port: 1I
         Box: 1
         Bay: 3
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 28
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

      physicaldrive 1I:1:4
         Port: 1I
         Box: 1
         Bay: 4
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 28
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

1 Answers1

1

I don't want to close this as a duplicate, but you should install the HP Management Agents to provide server health information. This is available via yum or using the individual packages listed on the support site for the ProLiant DL120 Gen9 and RHEL7.

See: HP ProLiant DL380e Gen8 server - SPP use for some ideas...

At the very least, you can use the hpssacli tool to give you actual RAID controller information on-demand.

But understand that the server is also capable of sending email, SNMP traps and logging health events when you include the other utilities.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Thanks for the information, I will give it a go and update. – copyandpaster Oct 29 '15 at 16:58
  • So I have added the spp repo and installed hpssacli tool. Its showing me that the array is ready for rebuild. Which I think is a side product of another issue so isn't important not. However do you know where I can find out how to send alerts when the raid has failed? – copyandpaster Oct 29 '15 at 17:05
  • Please post the full output of `hpssacli ctrl all show config detail` into your question. – ewwhite Oct 29 '15 at 17:27
  • I have now added that into the question. – copyandpaster Oct 30 '15 at 06:57