28

I have a Dell server running CentOS 6 using PERC H710 Raid Controller card with Raid 5 setup and I want to monitor the hard disk failure/working status behind the Raid Controller.

Then I should be able to use a bash script to monitor the hard disk status and send alert emails if something went bad.

The LSI MegaRAID SAS command tool (About LSI MegaRAID SAS Linux Tools) for CentOS/Red Hat/Linux does NOT support PERC H710 and smartctl does NOT support it either.

Based on Dell website, CentOS IS not supported for this server (NX3200 PowerVault) and I couldn't download any linux program to monitor the hard disk.

[root@server ~]# lspci | grep RAID
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05)


[root@server ~]# smartctl -a /dev/sda
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-431.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               DELL
Product:              PERC H710
Revision:             3.13
User Capacity:        299,439,751,168 bytes [299 GB]
Logical block size:   512 bytes
Logical Unit id:      ....
Serial number:        ....
Device type:          disk
Local Time is:        Tue Apr 15 16:38:30 2014 SGT
Device does not support SMART

Error Counter logging not supported
Device does not support Self Test logging

Anyone knows how to monitor the hard disk status behind hardware raid on Dell PERC H710 with CentOS 6?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Xianlin
  • 635
  • 4
  • 14
  • 21

7 Answers7

31

You can see the SMART status of the disks with the smartctl command and it's -d argument. For example, to see the first disk in the array:

# smartctl -a /dev/sda -d sat+megaraid,00
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.6.2.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST91000640NS
Serial Number:    ........
LU WWN Device Id: . ...... .........
Firmware Version: AA08
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P     showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Jul 10 11:21:52 2014 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
...
...
#

This is on Scientific Linux 6 (another RHEL6 based OS) with smartmontools-5.43-1.el6.x86_64.

Jonathan Barber
  • 541
  • 4
  • 4
  • `-d megaraid,0` was enough in `smartctl 6.6` on the command line. In the `DEVICESCAN` string in `/etc/smartd.conf` it needed `-d removable` – Stuart Cardall Jul 22 '19 at 14:26
27

S.M.A.R.T. is not the final word in disk or storage monitoring!! It's a component, but modern RAID controllers use it along with other methods to determine drive and array health.

I'm assuming this is a PERC controller in a Dell PowerEdge server.

The normal Linux-friendly approach to health monitoring of Dell hardware is to install the Dell OMSA agents for Linux via Yum - http://linux.dell.com/wiki/index.php/Repository/OMSA#Yum_setup

yum install srvadmin-all will install the full suite of agents. Once installed, you can use the omreport command to get information about your array.

Examples:

$ omreport storage vdisk

$ omreport storage pdisk controller=0

$ omreport storage vdisk controller=0 vdisk=1
slm
  • 7,355
  • 16
  • 54
  • 72
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 8
    this will install extra components such as web server/ssl on my Linux machines but it seems that I have no other choice! I hate to add unnecessary packages into my server. – Xianlin Apr 16 '14 at 15:40
  • 2
    Beware of potential memory leak from one of those OMSA programs. It happened to me slowly over the course of 3-4 weeks then boom, no more memory at all for linux. – bksunday Jan 19 '15 at 19:37
  • 1
    Yes, the leak is in dsm_sa_snmpd (so I run 'killall -9 dsm_sa_snmpd', solved). – markusN Jan 21 '15 at 10:54
  • 8
    The PERC 7xx and 8xx controllers are just LSI Megaraid controllers and the LSI MegaCLI tool will work just fine if you dont want to taint your system with dell libraries and whatever other services and/or kernel modules they are dropping these days. There are plenty of MegaCLI cheat sheets, nagios monitoring scripts and [performance tuning](https://calomel.org/megacli_lsi_commands.html) tips out there for the LSI binary. That is just my personal preference and opinion of course. I am a minimalist. – Aaron Feb 16 '16 at 04:23
  • @Xianlin, this is not entirely true. Yes, it will install a lot of garbage but see my answer. I didn't want to add unnecessary packages so I figured out only the ones I needed for storage. – Mike S Apr 05 '16 at 20:06
  • Thanks to @Aaron for avoiding all the bloat included in OMSA! Check out this question also: http://serverfault.com/questions/681056/xenserver-6-5-lsi-megaraid – Ryan Griggs May 19 '16 at 18:48
11

The accepted answer recommends the audacity that is yum install srvadmin-all. Blecch. Here's how to make it slightly less blecch-y (but still blecch-y nonetheless; you can get much leaner on HP's platform. But I digress...) By this I mean, only install those components necessary to manage storage on your machine.

BTW, the direct answer to the user's question lies in the item "Show physical disks on vdisk 0" in the list below.

wget -q -O - http://linux.dell.com/repo/hardware/latest/bootstrap.cgi > bootstrap.cgi
bash bootstrap.cgi
yum install srvadmin-base
yum install srvadmin-storageservices

Add to root's .bashrc:

export PATH=$PATH:/opt/dell/srvadmin/bin

Enjoy:

RAID Commands

  • Show all physical disks on controller 0

    $ omreport storage pdisk controller=0
    
  • Show all logical disks on controller 0

    $ omreport storage vdisk controller=0
    
  • Show all physical disks on vdisk 0

    $ omreport storage pdisk controller=0 vdisk=0
    
  • Reconfigure a vdisk to be raid1 from raid0 (COOL!!!!)

    $ sudo omconfig storage vdisk action=reconfigure controller=0 vdisk=1 raid=r1 pdisk=0:0:2,0:0:3
    
  • Create a vdisk on a new disk:

    $ sudo omconfig storage controller controller=0 action=clearforeignconfig
    $ sudo omconfig storage controller controller=0 action=createvdisk raid=r0 size=max pdisk=0:0:2
    

More Info

BTW, since this IS nothing more than a Dell-branded LSI MegaCLI card, you might find Han Solo's answer even better! I have yet to try it, however.

The Sweetness

Here's an example of omreport's output, piped through grep for a delicious bundle of data:

$ omreport storage pdisk controller=0 vdisk=0 | grep -v ": Not "
List of Physical Disks belonging to root

Controller PERC H700 Integrated (Embedded)
ID                              : 0:0:0
Status                          : Ok
Name                            : Physical Disk 0:0:0
State                           : Online
Power Status                    : Spun Up
Bus Protocol                    : SAS
Media                           : HDD
Failure Predicted               : No
Revision                        : HT64
T10 PI Capable                  : No
Certified                       : Yes
Encryption Capable              : No
Capacity                        : 136.13 GB (146163105792 bytes)
Used RAID Disk Space            : 136.13 GB (146163105792 bytes)
Available RAID Disk Space       : 0.00 GB (0 bytes)
Hot Spare                       : No
Vendor ID                       : DELL(tm)
Product ID                      : ST9146852SS
Serial No.                      : 6TB1AFDT
Part Number                     : CN0X162K7262213800JTA01
Negotiated Speed                : 6.00 Gbps
Capable Speed                   : 6.00 Gbps
Sector Size                     : 512B
Manufacture Day                 : 05
Manufacture Week                : 10
Manufacture Year                : 2011
SAS Address                     : 5000C500395E44C5

ID                              : 0:0:1
Status                          : Ok
Name                            : Physical Disk 0:0:1
State                           : Online
Power Status                    : Spun Up
Bus Protocol                    : SAS
Media                           : HDD
Failure Predicted               : No
Revision                        : HT64
T10 PI Capable                  : No
Certified                       : Yes
Encryption Capable              : No
Capacity                        : 136.13 GB (146163105792 bytes)
Used RAID Disk Space            : 136.13 GB (146163105792 bytes)
Available RAID Disk Space       : 0.00 GB (0 bytes)
Hot Spare                       : No
Vendor ID                       : DELL(tm)
Product ID                      : ST9146852SS
Serial No.                      : 6TB1AFEY
Part Number                     : CN0X162K7262213800FPA01
Negotiated Speed                : 6.00 Gbps
Capable Speed                   : 6.00 Gbps
Sector Size                     : 512B
Manufacture Day                 : 05
Manufacture Week                : 10
Manufacture Year                : 2011
SAS Address                     : 5000C500395E3C1D
slm
  • 7,355
  • 16
  • 54
  • 72
Mike S
  • 1,103
  • 5
  • 19
  • 40
  • @slm Regarding your edit- does it really work without root? I don't have omreport/omconfig in front of me these days, but I'm not sure any user can just create a vdisk. The '$' on the command line implies regular user, not root. – Mike S Jun 15 '19 at 23:47
  • 1
    Yeah I just did this the other day when I was dealing w/ a Dell 730 all the cmds except those 2 that do "creates" didn't require root, I'll fix. – slm Jun 16 '19 at 00:50
7

I was struggling also to get it work in CentOS and I found a working package here http://mirror.ndchost.com/software/lsi/

called "MegaCli-8.07.10-1.noarch.rpm"

The command reference http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS

I hope it helps.

Han Solo
  • 71
  • 1
  • 1
  • 1
    I would absolutely agree, use /opt/megacli/MegaCli64 -PDList -aALL | grep -i firmware and it'll tell you if the physical disks are ok. (Command is from http://erikimh.com/megacli-cheatsheet/ - see it if I used the wrong one). Basically the raid care does a great job of monitoring the disks, so just keep track of it's opinion of the disks' operating states. – Some Linux Nerd Apr 05 '16 at 22:18
  • You may prefer to download directly from Broadcom instead of some random open directory: https://docs.broadcom.com/docs/12351587?_ga=2.81670295.1654370925.1557214314-814067834.1557214314 – Logg Jun 25 '20 at 07:29
3
smartctl -d megaraid,00 -a /dev/sda
Got MegaRAID inquiry.. FUJITSU MBE2147RC       D906
Device: FUJITSU  MBE2147RC        Version: D906
Serial number: xxxx
Device type: disk
Transport protocol: SAS
Local Time is:
HBruijn
  • 72,524
  • 21
  • 127
  • 192
user311347
  • 51
  • 1
  • 8
    Please consider reading [How do I write a good Answer?](http://serverfault.com/help/how-to-answer) in our help center and then revise the Answer. Your Command may technically be a solution, which was also already mentioned in the other, much older answers and some explanation is welcome. Thanks in advance. – HBruijn Sep 15 '15 at 10:00
  • 1
    The other answer used "sat+megaraid", which did not work for me. (Right, I did not know the smartctl command well and did not know to how alter the command to make it work.) This answer led me on the right path, and it works for me. – Yongwei Wu Feb 06 '17 at 02:47
2

The perccli command can also show you a lot of drive info if you ask it nicely:

# /opt/MegaRAID/perccli/perccli64 /c0/e32/s0 show all
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive /c0/e32/s0 :
================

-------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model            Sp
-------------------------------------------------------------------------
32:0      0 UGood -  278.875 GB SAS  HDD N   N  512B ST3300657SS      U
-------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


Drive /c0/e32/s0 - Detailed Information :
=======================================

Drive /c0/e32/s0 State :
======================
Shield Counter = 0
Media Error Count = 0
Other Error Count = 0
Drive Temperature =  40C (104.00 F)
Predictive Failure Count = 1
S.M.A.R.T alert flagged by drive = Yes

This needs to be repeated for each enclosure slot, or at least I haven't found a way to print all of it at once with a single perccli command.

It's also easy to install compared to other, more comprehensive options:

# curl -C - -O 'https://downloads.dell.com/FOLDER04470715M/1/perccli_7.1-007.0127_linux.tar.gz'
# tar xzvf ../perccli_7.1-007.0127_linux.tar.gz
# cd Linux/
# yum localinstall perccli-007.0127.0000.0000-1.noarch.rpm
# cd /opt/MegaRAID/perccli/

perccli is NOT a comprehensive monitoring suite like Delll OMSA, but it sounds like many folks don't want something comprehensive and instead need a decent, simple tool.

Steve Bonds
  • 874
  • 2
  • 10
  • 19
  • You can use `/opt/MegaRAID/perccli/perccli64 /c0/e32/sall show all` to show info for all disks – c97 Jan 12 '20 at 10:46
-1

Hi I have a similar Dell PERC/LSI card and I needed to check the RAID status. LSI have a utility called sas2ircu which I found quite useful, there is also a version for Windows and Linux.