ESXi Server Health Monitoring

Question

As VMware has stated, now is the time! I have started to read up on and plan for our upgrade from vSphere ESX 4.0 to vSphere ESXi 4.1. While I know vSphere 5 should be out sometime this Fall I am pretty sure this initial planning will apply to that version as well. One of my major concerns is that I want to be able to effectively monitor the health status of our hosts. My question is two parts: 1) Should my current setup still work, 2) What are some other suggestions?

My current setup to monitor the health status of our servers, and alert on failures, is a combination of iDRAC6 alerting and WUG (Whats Up Gold) catching SNMP traps. The iDRAC6 can send to the SMTP server and send email if something physical, except for storage events, degrades or fails on the server. The servers are also configured to send SNMP traps to WUG, which does monitor storage events and is a secondary notification on other events. To get this setup I edit the SNMPD.CONF files via the service console, which of course is going away. It looks like the new method to do this, if I try to continue this, is detailed in this VMware KB. Is anyone using the SNMP traps setup to monitor their hardware and done the setup that is described?

The second part to my question is; could there be a better way to monitor the health status of my hosts? I know that there are other methods but, without being argumentative, what are other ways, that might even be better, to monitor the health status of hosts? I have been looking at CIM but I am not sure what sits on the other end and interprets what CIM is saying is wrong. What methods is everyone else using to get this data?

score 4 · Accepted Answer · answered May 12 '11 at 15:52

4

I use the data coming out of the (i)DRAC, combined with the data that ESXi harvests via CIM, with vCenter configured to alert on faults coming out of the CIM monitoring.

I'm a little unclear on what you're saying about the trustworthiness of the CIM data, but I personally trust it a heck of a lot more than I would trust the SNMP traps being fed to WhatsUp. CIM will catch and throw alerts on something as minor as low voltage on the BIOS battery, as long as your hardware is well supported (as recent Dell equipment is), and vCenter is pretty flexible about choosing what, where, and how often you throw alerts on those events.

answered May 12 '11 at 15:52

Shane Madden

112,982
12
174
248

Oh I meant CIM interpeting what the failed piece of hardware is. I have no experience with CIM. I looked at the CIM SMASH and I am worried that I am going to have to take a lot of time to learn to program it. Is there much configuration to the CIM setup? – Chadddada May 12 '11 at 17:05
Oh, gotcha. There's an integrated "Hardware Status" thing that uses CIM to get status from the physical hardware and present/integrate that data in vSphere, from which you can generate alerts in vCenter. They have an overview of the feature [here](http://blogs.vmware.com/kb/2011/03/hardware-monitoring-in-esxi.html). – Shane Madden May 12 '11 at 17:13
Let me check this out. We have never used the alerting offered by vCenter yet. I have looked at the health status in that tab but I never realized that was getting its data by CIM providers. – Chadddada May 13 '11 at 13:17
Would you happen to have an alarm set for alerting on power redundancy lost? I am trying to configure that and I can't seem to get the alarm correct. I do see the loss of redundancy in the host health status, but I can't seem to generate any chatter off the box to report this new degraded state. Things such as VM state changes I have working, so I know vCenter generating mail is working, but the power state change isn't alerting. – Chadddada May 13 '11 at 14:09
Make sure the scope on your new alert's scope applies to the host in question. An email action on a state change from good to anything else should suffice.. – Shane Madden May 13 '11 at 14:15
For my most basic alarm I have it as Host error/Monitor for specific event occurring/Trigger - alert - ? . What should I use as an argument/Value? Also when I do cause an alarm, I.E.: Pull a redundant power cable, I do see it change in the health status but I don't see any alerts show up in the triggered alarms section. Is that part of my problem, that that heath status change isn't showing up there? Also I saw where someone changed group - equals to objectname - starts with in vCenter 4.1; is that your setup also? – Chadddada May 13 '11 at 15:26
I think this will be the correct answer and I will open up another question if I have troubles with the vCenter alarms. – Chadddada May 16 '11 at 14:01

score 3 · Answer 2 · answered May 12 '11 at 16:41

3

If your hosts are Dell, I very strongly recommend looking at Dell Management Plug-in for vCenter. It's a very well made tool, and it enables a lot of hardware specific alerting within vCenter framework that you wouldn't get otherwise. It's not a free product, but the price is reasonable, and the results are worth it.

answered May 12 '11 at 16:41

Max Alginin

3,284
14
11

Yeah are supposed to be getting this soon. Does this do active health monitoring/alerting for you or is it just a health view of your servers you only look at in vCenter? – Chadddada May 13 '11 at 13:09

score 0 · Answer 3 · edited Apr 13 '17 at 12:14

0

See: VMWare vSphere and the move from ESX to ESXi

For the monitoring question, I'm assuming most are going with monitoring the systems using purpose-built versions of ESXi for the relevant hardware and by monitoring traps from Virtual Center.

edited Apr 13 '17 at 12:14

Community

1

answered May 12 '11 at 15:52

ewwhite

194,921
91
434
799

ESXi Server Health Monitoring

3 Answers3