Hardware checks for Dell R820 Servers through Nagios using SNMP

Question

We use Nagios for monitoring. Is there a way to create hardware checks using SNMP MIB for R820 servers running ESXi5.x on them? Right now we are using this python plugin:

current python plugin

But we can use it no longer due to security policies within the org. We are satisfied with the output of the current plugin, therefore it would be great if we could use similar agent less check using SNMP. Thanks

What are you interested in monitoring? Is this part of vSphere with a central vCenter or is this a standalone host? — ewwhite, Aug 14 '13 at 13:58
Servers are part of vSphere cluster. We want to get hardware information the closest of what one could get using OME. It is just we would not want to use any extra monitoring silos , but Nagios. — Danila Ladner, Aug 14 '13 at 14:07

ewwhite · Accepted Answer · 2013-08-14T14:40:38.190

4

Maybe I'm weird, but I prefer to monitor my ESXi hosts in a vSphere cluster through the vCenter SNMP interface (coupled with email for certain events). That covers most of what I need. So it's alerting about events versus polling the hardware through something like Nagios.

Can you clarify which specific items you're most interested in monitoring at the host level?

I think vSphere's traps and email alerts can be as granular as you wish...

enter image description here

edited Aug 14 '13 at 14:40

answered Aug 14 '13 at 14:11

ewwhite

194,921
91
434
799

we have that as well for our HP servers cluster. Which I like as well, I and our Ops still more interested in deeper level, unless vSphere can get me that as well, something like information about that CPU FAN on 3d socket stopped spinning, or overall system board temperature got increased higher than acceptable level. Something to this direction. And since we use OpsView for all monitoring need in organization ti would be desirable not to use another silos + generate some custom SNMP traps we need. – Danila Ladner Aug 14 '13 at 14:23
@DanilaLadner Those are all covered under the host hardware status alarms. – ewwhite Aug 14 '13 at 14:40
Nice, can I somehow have checks poll these objects connecting to vSphere Server from Nagios? Is there any API which will allow to do that? Do not expect you to answer that. Thanks for help. I might consider this and research more on that. Thank you again. – Danila Ladner Aug 14 '13 at 14:44

Keith · Answer 2 · 2013-08-22T16:56:29.940

2

Nope. VMware has chosen to go the CIM route instead of SNMP, so you can't do exactly what you asked about. The only SNMP support they have implemented is trap-sending, which was very buggy last time I tried it (admittedly a few years ago).

Two good options have already been discussed here (check_esxi_hardware.py, OP5's check-esx-plugin).

As you're probably aware, Nagios Exchange is littered with other people's attempts to solve this, but most of them are outdated and will not work with modern VMware products.

Regarding the problem of having root access, the python plugin used to work without root access past the root level of the CIM tree (e.g., not inherited down to the VMs themselves), but that seems to no longer be the case as of 5.1. You could probably create a special role for Nagios to use (that isn't the administrator role), though.

Judging by the comments you made above (about wanting more-detailed hardware status monitoring), you might be better served by some IPMI checking through the service processor (BMC, LOM, iLO, whatever you want to call it) in that case.

If you're specifically dealing with Dell hardware, you can add the Dell-specific offline bundle (VIB) to enable OpenManage support in ESXi.

In the future, you might be able to use the excellent check_openmanage plugin for this, but it's not currently possible.

edited Aug 22 '13 at 16:56

answered Aug 14 '13 at 15:34

Keith

4,627
14
25

Heh, I even cloned the Administrator Role and named it differently and the check got a "permission denied" response. + no groups in Vsphere Client are seen, I had to go and manually edit /etc/groups for 5.1 Now sitting and reading "DELL SNMP Reference Guide" which is 600 pages long. Sigh. – Danila Ladner Aug 14 '13 at 15:38
Monitoring from the host doesn't make sense in this manner. Leverage the Dell CIM agents and vCenter's facility. – ewwhite Aug 14 '13 at 16:07
@ewwhite, plenty of people use nagios to monitor the whole network infrastructure, so it makes a lot of sense to include ESX in it instead of having to check yet another console, IMHO. – natxo asenjo Aug 14 '13 at 18:51
You're right. Except when Nagios doesn't work for the intended application... – ewwhite Aug 14 '13 at 18:56
Not everyone $hells out for vCenter, though – Keith Aug 15 '13 at 15:08
in this case nagios *works* with the intended application ;-) – natxo asenjo Aug 16 '13 at 07:08

score 0 · Answer 3 · answered Aug 14 '13 at 14:54

0

we use the check_esx plugin from op5 (http://www.op5.org/community/plugin-inventory/op5-projects/check-esx-plugin) exactly for this purpuse. You need to install the vmware perl sdk.

We use it like this:

check_esx -H $HOSTADDRESS$ -u root -p passwd -l runtime -s health
CHECK_ESX.PL OK - All 449 health checks are Green | Alerts=0;;

The check_esx plugin can monitor a lot of stuff, great work from the op5 guys.

answered Aug 14 '13 at 14:54

natxo asenjo

5,641
2
25
27

Yeah, those are all good and great, but unfortunately they require root access to query CIM info. So this part "$HOSTADDRESS$ -u root -p passwd" is not going cut for us anymore starting September 1 – Danila Ladner Aug 14 '13 at 15:04
Ok, so from september this is not going to work anymore on esx? Or just at your place because of your new security policies? According to the plugin doc (https://kb.op5.com/display/HOWTOs/Monitoring+VMware+ESX+3.x,+ESXi,+vSphere+4+and+vCenter+Server) you could use a local esx user with just read only profile rights – natxo asenjo Aug 14 '13 at 18:40
Security policies in place. I've tried Read-Only it doesn't work, at least on 5.1 – Danila Ladner Aug 14 '13 at 18:47
I have just tried and it works on 5.1. I added a local user in the esx host, then on the permissions tab added the new user to the read-only role and I can succesfully monitor the hardware runtime with check_esx and this user. – natxo asenjo Aug 15 '13 at 11:33
tested on both a dell r720 and a r820 both running esxi 5.1 – natxo asenjo Aug 15 '13 at 11:43
Let me try on mine. I am sure I've tried it. – Danila Ladner Aug 15 '13 at 13:39
Unfortunately, That is what i get: /usr/local/nagios/libexec/check_esxi_hardware.py -H geo-vsprdesx-25.domain.com -U nagios_user -P password -V dell -p MONITORED BY: nagios_server RETURN CODE: 2 (CRITICAL) OUTPUT: : Authentication Error! - Server: – Danila Ladner Aug 15 '13 at 14:45
I see you are not using the check_esx plugin from op5 but the python plugin. No experience there. With the check_esx (perl plugin, you need the vmware perl sdk), it works as I indicated earlier. – natxo asenjo Aug 15 '13 at 19:48

score 0 · Answer 4 · answered Jan 05 '16 at 01:41

The problem with check_esxi_hardware and a read-only or non administrator role user (not root) is due to a PAM feature or bug in ESXi 5.1 and later depending on your point of view.

Any user that is created and assigned to any role other than the administrator role is set to denied ALL in /etc/security/access.conf. Even if you clone the administrator role and assign the user you create to this clone role it will be set to denied ALL in /etc/security/access.conf.

I have created a user "nagios" on an ESXi 5.5 host locally (not through vCenter) and assigned it to the "Read Only Role" under the permissions tab. By default its permissions in access.conf are "-:nagios:ALL"

If I ssh to the ESXi host and edit /etc/security/access.conf and change the nagios user permissions to "+:nagios:sfcb" or "+:nagios:ALL" then check_esxi_hardware works.

Using "+:nagios:sfcb" restricts the user "nagios" so it can only access the CIM Service.

The problem you now encounter is changes to /etc/security/access.conf aren't persistent across reboots.

This is a thread in the VMware communities discussing this problem: https://communities.vmware.com/thread/464552?start=15&tstart=0

This is a very good article discussing the same problem using wbem: https://alpacapowered.wordpress.com/2013/09/27/configuring-and-securing-local-esxi-users-for-hardware-monitoring-via-wbem/

These are two blogs discussing making changes persistent over reboots in ESXi:

www.therefinedgeek.com.au/index.php/2012/02/01/enabling-ssh-access-in-esxi-5-0-for-non-root-users/

www.virtuallyghetto.com/2011/08/how-to-persist-configuration-changes-in.html

I can't make the last two links hyperlinks as this is my first post to serverfault and until you have 10 reputation points you can only put two links in an answer (which is fair).

I haven't decided which solution I will use to make the this persistent across reboots. I am still testing.

Thanks

Hardware checks for Dell R820 Servers through Nagios using SNMP

4 Answers4