0

I'm looking at setting up a Nagios (or perhaps OpsView) server for monitoring our network.

I have a few periphery devices whose oid schema doesn't include nodes for some metric I want to monitor. Currently I monitor the metric based on status emails sent from the device itself periodically.

Can Nagios (or Opsview) be set up to report the device status based on the contents of a received email?

Ultimately I'd like to get it down to something like a red/green status. Bonus points if I can get a third (eg yellow) status indicating that the expected status email hasn't been received.

If neither Nagios nor Opsview can do this, I'm open to suggestion for something that can. Even if it does just that and I'll use Nagios for the remainder/typical net monitoring tasks.

Thanks all.

Edit- As requested, this is an [sanitized] example of an email I would want to parse/act on:

Return-path: <notificationsvc@example.com>
Envelope-to: admin@example2.com
Delivery-date: Fri, 28 Nov 2014 03:15:21 -0600
Received: from [xx.xx.xx.xx] (port=49676 helo=DiskStation)
    by mailserver.example.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256)
    (Exim ver x.xx)
    (envelope-from <notificationsvc@example.com>)
    id 123456-000000-1Z
    for admin@example2.com; Fri, 28 Nov 2014 03:15:21 -0600
Date: Fri, 28 Nov 2014 04:15:21 -0500
From: "Fifteen " <notificationsvc@example.com>
To: <admin@example2.com>
Subject: =?UTF-8?B?RmlmdGVlb[snipped]
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

Dear user,

Function X on Fifteen has been completed.

Task: Function X to server3
Target Server: server3.example.com (xx.xx.xx.xx)
Action Time: 2014/11/28 04:15

EDIT 2 - So just to follow up @DanielAgans suggestion, it's a bit too homegrown for me at this time. It's just a bit too far out of my skillset and comfort zone to tackle right now but I'd really like to have a way to monitor these boxes rather than having to manually watch and process the emails. Was really hoping to find some sort of plug in for nagios or Opsview. My searching and Daniel's comment makes me feel as though this might be a dead end request.

Can anyone confirm that Nagios/Ops will indeed NOT do what I need?

And of course open to suggestions of how can do it, albeit less programming intensive than what Daniel already suggested.

Thanks.

JoelAZ
  • 131
  • 7
  • I've planned to do this with a Nagios node. I wasn't able to find a plugin so I'll be using fetchmail http://www.fetchmail.info/ and probably python to parse the mail for the contents I want. – Daniel Agans Nov 28 '14 at 17:54
  • Please add an example email – 030 Nov 28 '14 at 18:32
  • Sounds right up my alley @DanielAgans. Prett new to the whole thing so I'd be grateful for as much as you can share of the solution. – JoelAZ Nov 28 '14 at 19:49
  • @utrecht - stealing glances on my iPad while at my sisters for turkey day pt 2. Will post up an example late tonight, thx. – JoelAZ Nov 28 '14 at 19:49
  • @utrecht - added example email as requested. Not sure how it affects the request but as requested... – JoelAZ Nov 29 '14 at 02:17
  • To whomever downvoted the question, please offer a reason for the downvote. Thank you. – JoelAZ Nov 29 '14 at 02:17
  • I think it would be better to find a way to write that output to a log file, and then use one of the many check_log variants, instead of involving email in the process. – Keith Nov 30 '14 at 16:49
  • Please elaborate @Keith. Do you mean to have the device write its success/fail output to a log rather than sending via email? If so, this is not possible. The function is part of the device (from the manuf) and the only method avail for reporting its status is via email. – JoelAZ Dec 01 '14 at 03:09

1 Answers1

0

I do this sort of thing all the time, using handwritten parsers and send_nsca. For example purposes, let's stick with a function called "function X" on a machine called server3.

The way I generally do it is to use a recipient address that's specific to the service in question - let's say functionXs3@example.com. You can do this on a general-purpose mailbox, but it means more logic in the parser (which means more chance of errors causing false negatives) and the chance of accidentally deleting email intended for a human. I have my mail server receive the address into a small program, eg for sendmail, in the aliases file:

functionXs3:              "|/usr/local/bin/functionx"

In turn, /usr/local/bin/functionx is a small, lightweight script that looks for the characteristic signs of success/failure, and responds accordingly. I'm assuming that the text Function X on Fifteen has been completed signals success, and its absence signals failure, so the parser can be something like:

#!/bin/bash
grep "Function X on Fifteen has been completed" && \
 ( echo "server3   function X   0   success" | send_nsca -H nagios.example.com ) || \
 ( echo "server3   function X   1   failure" | send_nsca -H nagios.example.com )

This is not an appropriate place for a primer on setting up and configuring send_nsca on the client, nor likewise the NSCA listening daemon on the server, but note that the groups of spaces in the echo statements above must be single TABs, and that success and failure are flavour-text, which will appear in NAGIOS' "status information" column, and any appropriately-configured notifications.

Inside NAGIOS, you define the host and service accordingly:

define host{
        use                     host-template
        host_name               server3
        address                 192.168.34.56
        }

define service{
        use                     passive-service-template
        host_name               server3
        service_description     function X
        max_check_attempts      1
        check_freshness         1
        freshness_threshold     100000
}

Note how the host_name and service_descriptionexactly match the first two fields echoed into send_nsca. The freshness_threshold should be set to a value in seconds somewhat bigger than the normal frequency with which the function job should run; in my case, the job should run once a day (86400s). Writing the service and host templates must be mostly left as an exercise, but as a guide, something like this will be useful:

define service{
        name                            passive-service-template
        use                             service-template
        normal_check_interval           60
        retry_check_interval            60
        active_checks_enabled           0 ; passive checks already enabled
        check_command                   check_dummy!2 "STALE SERVICE"
        register                        0      
        }

define command{
        command_name    check_dummy
        command_line    $USER1$/check_dummy  $ARG1$ 
        }

Note the use of check_dummy as a freshness check. This is what NAGIOS invokes when it hasn't had a passive service notification in the last freshness_threshold seconds, and all it does is throw a CRITICAL alert with the status text "STALE SERVICE". That means that if the notification chain fails silently for whatever reason - job didn't run, email wasn't delivered, parsing script was broken, NSCA failed - NAGIOS will grumble at the appropriate recipients so the issue can be investigated.

This can only be a fleeting overview of an answer. I thought some concrete examples of config might be helpful, but a full cookbook is way outside the scope of an SF answer, so please don't ask for lots more configuration detail.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • Thank you MadHatter. I believe this could be the answer I'm looking for however processing all you've written will take some time and energy on my part. Don't worry, I'll not be asking you to write me a "for dummies" guide on Nagios. :) There's resources where I can brush up on that. The interesting part to me is Function X. I hope you won't mind if I've any follow up questions on that. Anyway, I'll take some time to process this and if I've no [reasonable] follow up I expect I'll take this as answer. Thanks for detailed and helpful reply. I've got a lot to chew on. – JoelAZ Nov 30 '14 at 10:25
  • @JoelAZ: specific follow-up questions are fine, and I'll try to do my best to answer them satisfactorily. – MadHatter Dec 01 '14 at 11:15