3

We have an environment where we are currently monitoring ~50 VM hosts running esxi.

We easily get alerts for hardware events and exceeding performance thresholds via vcenter.

We're planning on adding some KVM hosts to our infrastructure. But we have no idea how to get similar capabilities in terms of monitoring and alerting.

Any suggestions? I've seen RHEV and it looks promising but I'm not sure if management wants to deal with licensing for both hypervisors at the moment. Maybe there are some free utilities that do a good enough job?

  • This is a good question... part of the reason people buy VMware :) I'm interested in the answers. – ewwhite Oct 04 '14 at 23:29
  • 1
    The libvirt API provides most or all of the information you would want to monitor your VMs, though I am yet unaware of anything that uses it extensively enough to be considered a complete solution. – Michael Hampton Oct 04 '14 at 23:53
  • 1
    @MichaelHampton monitorix, nagios, zenoss, pandoraFMS, munin and probably quite a few others have libvirt checks – dyasny Oct 05 '14 at 00:58

1 Answers1

2

Since KVM is a part of Linux, any Linux based monitoring solutions will work, from the standard nagios checks to proprietary hardware health monitoring solutions, like Dell OMSA.

RHEV will monitor the hosts' health, keep the VMs working and make sure all hosts can access all the required cluster resources, but it is not meant to monitor hardware level issues, e.g. a degraded raid array on the host is not something RHEV will look for, but as I mentioned, since RHEV hosts are just Linux, you can install whatever monitoring you prefer and integrate the host health monitoring with the rest of your monitoring systems.

BTW, if you want to try RHEV, it is free for the first two months, and there's also upstream oVirt you can try.

dyasny
  • 18,482
  • 6
  • 48
  • 63
  • And the same for performance thresholds, VM state and such? – ewwhite Oct 04 '14 at 23:38
  • VM states are tracked, and acted upon (for HA and load balancing policies), what do you mean by performance thresholds? – dyasny Oct 04 '14 at 23:50
  • In VMware, I can get alerts when a virtual machine is running hot CPU-wise or having some other issue. – ewwhite Oct 04 '14 at 23:51
  • There's a large set of VM and other entity related events you can set up alerts and notifications for. And the list is of course extensible - statuses are visible through the API so a monitoring solution can check them – dyasny Oct 05 '14 at 00:09
  • 1
    and to add to that, if you also install RHEV-Reports (it's optional, and comes free with RHEV), you get a history database running, and a reporting engine, so you can watch load, availability and other trends and history over time, troubleshoot those trends etc. There's also a free ManageIQ and VMTrubo appliances available for RHEV, also designed to deepdive into load trends and other criteria. Lots of absolutely free and serious BI functionality available there – dyasny Oct 05 '14 at 00:24
  • I'm guessing anything that would expose hardware details like failed disks/raid array would likely require some kind of integration with the vendors hardware. VMware has the ability to access these hardware statuses. Any reason it seems less pertinent in KVM? Maybe just lack of vendor interest? I wonder why. RHEV does seem to be the solution to everything else other than hardware based alerts though. I'll likely have to to look into some vendor specific utility like omsa instead for this functionality. – cheesesticksricepuck Oct 05 '14 at 08:22
  • 1
    it's simple. vmware created a system where hardware monitoring was impossible using standard tools, so they had to fulfill the technical debt themselves. KVM doesn't have this problem, you can monitor KVM hosts exactly the same way you would monitor any other host. RHEV is a virtualization management platform, it focuses on virtualization issues, and does it well. Hardware monitoring is done best by vendor tools, and you can use those directly, without resorting to 3rd party implementations – dyasny Oct 05 '14 at 13:06