Ideally, with as simple of install as possible and without requiring rebooting the servers. Mostly for DL380 G5's
if it helps.
- 167
- 2
- 2
- 8
- 1,174
- 2
- 9
- 18
-
Shame they're not G7s, or you could use HP Insight Manager. – Tom O'Connor Dec 19 '11 at 15:18
-
Are your servers running Windows or Linux? – Tom O'Connor Dec 19 '11 at 15:20
-
What operating systems are you running on these servers? – ewwhite Dec 19 '11 at 15:20
-
They're all 2003 or 2008 both vanilla and R2. I played with SIM but it wasn't able to talk to my G5's. – DrZaiusApeLord Dec 19 '11 at 15:36
-
SIM should still be compatible with G5 ProLiants. Did you have the agents installed when you tried before? – ewwhite Dec 19 '11 at 16:22
4 Answers
This depends slightly on the operating systems you're running on the servers, but in general, it is possible to obtain alerts from HP ProLiant servers and Smart Array RAID controllers.
The full driver and software support listing for your DL380 G5 systems is listed here.
SNMP and a monitoring solution is the best approach... But you can augment that with some of HP's tools. HP offers the HP Systems Insight Manager, which is available for download and also comes with the servers. This is ideal for collections of servers. If you're looking for one-off alerts without building a management or monitoring infrastructure, you can simply install the HP Management Agents (aka ProLiant Support Pack).
For standalone Linux systems, I'll have the agents send traps via email. I'll usually configure the support pack with defaults or a custom bundle, then edit /opt/hp/hp-snmp-agents/cma.conf
and change the trapemail
line to point to the recipient address:
########################################################################
# trapemail is used for configuring email command(s) which will be
# executed whenever a SNMP trap is generated.
# Multiple trapemail lines are allowed.
# Note: any command that reads standard input can be used. For example:
# trapemail /usr/bin/logger
# will log trap messages into system log (/var/log/messages).
########################################################################
trapemail /bin/mail -s 'HP Insight Management Agents Trap Alarm' systems@1234.net
If you're running Linux and don't want to install the full HP management suite, you can develop a script around the cciss_vol_status utility to query controller/disk status. Also see: Installing HP Agents on OpenFiler
-
any elegant way to test an alert for a RAID array failure, other than pulling a drive out of the slot? I've got a couple `ProLiant DL360 G7` servers, and HP SIM set up for monitoring. – Banjer May 15 '13 at 13:24
-
Not that I know of. The Insight agents definitely work. If you can see the array status via the hpacucli utility and you know you're receiving alerts in HP SIM, I think it's fair to assume things will work. – ewwhite May 15 '13 at 13:30
Check out HP Insight Manager
https://www.hpe.com/us/en/product-catalog/detail/pip.489496.html#
I believe it should work with your Servers.
I used the lightweight program that @ewwite mentioned in his answer: cciss_vol_status
If you follow the accompanying INSTALL instructions, the script is placed in /usr/local/bin/cciss_vol_status
.
Here is a wrapper script I use to grep the output of cciss_vol_status, and send an email if any array has a status of FAILED.
#!/bin/bash
#
# Check status of RAID volumes on HP Smart Array controllers. Send an email
# alert if any volumes have a FAILED status.
#
status=`/usr/local/bin/cciss_vol_status /dev/sd*`
# email lock file
lockfile=/tmp/raid.check.hp.smartarray.lock
# how often to send an email (minutes)
_notification_freq=59
_host=`hostname`
# To: email
_toemail=root
# create email lock file
[ ! -f ${lockfile} ] && /bin/touch ${lockfile}
if echo $status | grep -q FAILED
then
# make sure we haven't sent a notification in the last X minutes
if test `find ${lockfile} -mmin +${_notification_freq}`
then
echo -e "${status}" | /bin/mail -s "System Alert! RAID failure on ${_host}" ${_toemail}
# update lock file mod time
/bin/touch ${lockfile}
fi
fi
Call the above script in cron. I run the check every two minutes:
*/2 * * * * /usr/local/bin/raid.check.hp.smartarray.sh
We do use HP System Insight Manager to check if our HP's are up and running, but nothing beyond that. I found the Linux agent to be overkill for us, since we have other monitoring solutions in place, so this script above serves its specific purpose well.
UPDATE
Just a troubleshooting tip in case you run into this. This script proved helpful this morning when I got an email about a failed array with:
Cache dirty limit reached
The device went read-only and was not visible in /proc/partitions
. I rebooted the server and saw these messages on boot:
Logical drive(s) disabled due to possible data loss. Select "F1" to continue with logical drive(s) disabled Select "F2" to accept data loss and to re-enable logical drive(s)
I selected F2 and the RAID was fine and mounted on boot.
- 3,854
- 11
- 40
- 47