2

Within the large set of files that comprise our Nagios server, is service check for load:

define service{
        use                             generic-service
        name                            check-load
        hostgroup_name                  nrpe-hosts,!webnodes,!build-cluster
        notification_options            c,r
        service_description             NRPE - Load
        check_command                   check_nrpe!check_load
        contacts                        irc
}

And two contacts:

define contact{
        contact_name                    irc
        alias                           ircbot
        host_notification_period        24x7
        service_notification_period     24x7
        host_notification_options       d,u,r,f
        service_notification_options    w,u,c,r,f
        service_notification_commands   notify-by-epager
        host_notification_commands      host-notify-by-epager
        pager                           irc@example.com
        }

define contact {
       contact_name                             pagerduty
       alias                                    PagerDuty Pseudo-Contact
       service_notification_period              24x7
       host_notification_period                 24x7
       service_notification_options             u,c,r
       host_notification_options                d,r
       service_notification_commands            notify-service-by-pagerduty
       host_notification_commands               notify-host-by-pagerduty
       pager                                    lol-no
}

EDIT: also, a service inheritance thing:

define service{
        name                            generic-service
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           3
        retry_check_interval            1
        notification_interval           0
        notification_period             24x7
        notification_options            w,c,r
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

Edit2: And an notify command definition, just for the doubters ;) :

# 'notify-by-epager' command definition
define command{
        command_name    notify-by-epager
        command_line    /usr/bin/printf "%b" "Service: $SERVICEDESC$\nHost: $HOSTNAME$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\nInfo: $SERVICEOUTPUT$\nDate: $LONGDATETIME$" | /bin/mail -s "$NOTIFICATIONTYPE$: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" $CONTACTPAGER$
}

Edit3: And a host definition:

define host{
        host_name                       vmprod1
        alias                           vmprod1.example.com
        address                         192.1.1.123
        use                             generic-host
        hostgroups                      nrpe-hosts,vm-hosts,vm-prod,dellraid-hosts
        contact_groups                  example,example-pager
}

This is the only check with the service description "NRPE - Load". By my reading, this should only alert the irc contact, and not the pagerduty contact. Yet I got over 100 "NRPE - Load" alerts last month in PagerDuty.

What am I missing?

jldugger
  • 14,122
  • 19
  • 73
  • 129
  • 1
    This is most likely an inheritance issue - your 'use generic-service' might well be defining other contacts / contact groups. – Tim Brigham Dec 02 '15 at 21:17
  • Unfortunately, this does not appear to be the case; I've added in the generic-service template as evidence. =/ – jldugger Dec 02 '15 at 22:22
  • 1
    what does the "view config" section of the web interface show for one of those services? and add your notification command "notify-by-epager" to the question – Keith Dec 02 '15 at 23:22
  • I don't seem to have a 'view config' section? But I did add the notify command. – jldugger Dec 02 '15 at 23:59
  • Ah, it's a global view config. – jldugger Dec 03 '15 at 00:07
  • Apparently, services inherit contact groups from host definitions. Researching this now to see if I can undo that or define an empty contact group. – jldugger Dec 03 '15 at 02:04

1 Answers1

1

To repay my debt of graditude, I'll answer my own question. It turns out that services implicitly inherit from hosts, and thus the service check above had a contact setting and an inherited contact_group.

A simple fix to the service check will do:

define service{
        use                             generic-service
        name                            check-load
        hostgroup_name                  nrpe-hosts,!webnodes,!build-cluster
        notification_options            c,r
        service_description             NRPE - Load
        check_command                   check_nrpe!check_load
        contacts                        irc
        contact_groups
}
jldugger
  • 14,122
  • 19
  • 73
  • 129
  • 2
    I think it's more correct to use "contact_groups null", but if that works... go with it ;-) – Keith Dec 04 '15 at 15:25