5

I have a Nagios server and a monitored server. On the monitored server:

[root@Monitored ~]# netstat -an |grep :5666
tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN      
[root@Monitored ~]# locate check_kvm
/usr/lib64/nagios/plugins/check_kvm
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_kvm -H localhost
hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm
NRPE: Unable to read output
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost
NRPE v2.14
[root@Monitored ~]# ps -ef |grep nrpe
nagios   21178     1  0 16:11 ?        00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
[root@Monitored ~]#

On the Nagios server:

[root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 -c check_kvm
NRPE: Unable to read output
[root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159
NRPE v2.14
[root@Nagios ~]#

When I check another server in the network using the same command it works:

[root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.80 -c check_kvm
hosts:4 OK:4 WARN:0 CRIT:0 - karmisoft:running ab2c4:running kidumim1:running travel2gether1:running
[root@Nagios ~]#

Running the check locally using Nagios account:

[root@Monitored ~]# su - nagios
-bash-4.1$ /usr/lib64/nagios/plugins/check_kvm
hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running
-bash-4.1$

Running the check remotely from the Nagios server using Nagios account:

-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 -c check_kvm
NRPE: Unable to read output
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159
NRPE v2.14
-bash-4.1$

Running the same check_kvm against a different server in the network using Nagios account:

-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.80 -c check_kvm
hosts:4 OK:4 WARN:0 CRIT:0 - karmisoft:running ab2c4:running kidumim1:running travel2gether1:running
-bash-4.1$ 

Permissions:

-rwxr-xr-x. 1 root root 4684 2013-10-14 17:14 nrpe.cfg (aka /etc/nagios/nrpe.cfg)
drwxrwxr-x. 3 nagios nagios 4096 2013-10-15 03:38 plugins (aka /usr/lib64/nagios/plugins)

/etc/sudoers:

[root@Monitored ~]# grep -i requiretty /etc/sudoers
#Defaults    requiretty

iptables/selinux:

[root@Monitored xinetd.d]# service iptables status
iptables: Firewall is not running.
[root@Monitored xinetd.d]# service ip6tables status
ip6tables: Firewall is not running.
[root@Monitored xinetd.d]# grep disable /etc/selinux/config 
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
[root@Monitored xinetd.d]#

The command in /etc/nagios/nrpe.cfg is:

[root@Monitored ~]# grep kvm /etc/nagios/nrpe.cfg 
command[check_kvm]=sudo /usr/lib64/nagios/plugins/check_kvm

and the nagios user is added on /etc/sudoers:

nagios  ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/check_kvm
nagios  ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/check_nrpe

The check_kvm is a shell script, looks like that:

#!/bin/sh

LIST=$(virsh list --all | sed '1,2d' | sed '/^$/d'| awk '{print $2":"$3}')

if [ ! "$LIST" ]; then
  EXITVAL=3 #Status 3 = UNKNOWN (orange) 
  echo "Unknown guests"
  exit $EXITVAL
fi

OK=0
WARN=0
CRIT=0
NUM=0

for host in $(echo $LIST)
do
  name=$(echo $host | awk -F: '{print $1}')
  state=$(echo $host | awk -F: '{print $2}')
  NUM=$(expr $NUM + 1)

  case "$state" in
    running|blocked) OK=$(expr $OK + 1) ;;
    paused) WARN=$(expr $WARN + 1) ;;
    shutdown|shut*|crashed) CRIT=$(expr $CRIT + 1) ;;
    *) CRIT=$(expr $CRIT + 1) ;;
  esac
done

if [ "$NUM" -eq "$OK" ]; then
  EXITVAL=0 #Status 0 = OK (green)
fi

if [ "$WARN" -gt 0 ]; then
  EXITVAL=1 #Status 1 = WARNING (yellow)
fi

if [ "$CRIT" -gt 0 ]; then
  EXITVAL=2 #Status 2 = CRITICAL (red)
fi

echo hosts:$NUM OK:$OK WARN:$WARN CRIT:$CRIT - $LIST

exit $EXITVAL

Edit (10/22/13): Following all that, I am now able to get some response from the script:

[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm
Unknown guests
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost
NRPE v2.14
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_kvm
hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running
[root@Monitored ~]# su - nagios
-bash-4.1$ /usr/lib64/nagios/plugins/check_kvm
hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm
Unknown guests
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H localhost
NRPE v2.14

It seems like the problem is some how related to the check_nrpe command or something which is related to the nrpe installation on the server.

Edit 12/2/13: Other checks on the problematic server work: enter image description here

Itai Ganot
  • 10,424
  • 27
  • 88
  • 143
  • What's the line in `nrpe.cfg` for that particular command on the client? What type of plugin is this (bash/perl/etc)? – Nathan C Oct 16 '13 at 14:39
  • I've edited my original question and added the information you asked for – Itai Ganot Oct 16 '13 at 15:30
  • Do other NRPE checks work on that host? And does NRPE log anything when that check fails? – Keith Oct 17 '13 at 15:21
  • Yes, as you can see on the attached photo, other checks does work, using both nrpe and snmp, only that check_kvm doesn't work. I see nothing regarding the fail in the logs. Since then I've installed a few more physical vm host servers and added them to be monitored by Nagios but I didn;t encounter the problem there and the check_kvm commands work on them perfectly. – Itai Ganot Dec 02 '13 at 09:33

5 Answers5

4

Nice detailed write-up Itai! Have you tried reducing the complexity of the config to see if it works?

For starters, I would start by changing the line in nrpe.cfg to

command[check_kvm]=/usr/lib64/nagios/plugins/check_kvm

and temporarily change the /usr/lib64/nagios/plugins/check_kvm script to be something really simple like:

#!/bin/sh
echo Hi
exit 0

If that works, then you can start ratcheting up the complexity. Perhaps instead of giving the nagios user sudo access to the script, it really needs access to the virsh command and you can leave out the sudo part in the nrpe.cfg command line.

KJH
  • 372
  • 1
  • 14
  • I Have tried it and still getting NRPE: Unable to read output, any more suggestions? – Itai Ganot Oct 17 '13 at 11:09
  • What are the ownership and permissions on `/usr/lib64/nagios/plugins/check_kvm` ? – KJH Oct 17 '13 at 14:18
  • `-rwxr-xr-x 1 nagios nagios 2581 2013-10-17 13:48 /usr/lib64/nagios/plugins/check_kvm` – Itai Ganot Oct 17 '13 at 19:07
  • Did you try changing the script itself to something simple? I don't think the basic "Hi" one I suggested would be 2581 bytes? – KJH Oct 18 '13 at 01:21
  • Yes, I've tried changing the script but to no avail. more than that, the script works just fine when checked against another server, or if i run it locally, only when i use the `check_nrpe -H localhost -c check_kvm` method it returns `Unknown guests` – Itai Ganot Oct 21 '13 at 12:01
  • But `Unknown guests` is progress! Because that's a legitimate exit value from your check_kvm script. If that's consistent, then the issue is not with Nagios/NRPE but with your KVM install. – KJH Oct 21 '13 at 16:14
  • Hi Itai - join this chat room: http://chat.stackexchange.com/rooms/11147/troubleshooting-nagios-nrpe-issue – KJH Oct 21 '13 at 19:10
  • It seems like i missed you at the chat, but i've updated the questions, thank you. – Itai Ganot Oct 22 '13 at 07:24
  • Try turning on debug in NRPE (might need a restart) and capture the output from wherever it logs to. – KJH Oct 22 '13 at 21:36
  • Have you tried running `virsh list --all` as _root_ and as _nagios_ on that system? – KJH Oct 23 '13 at 14:32
  • debug is already running on the `nrpe.cfg` file. I've set nagios user to `/sbin/nologon` as it is configured on the rest of the vm host servers in my network. – Itai Ganot Dec 12 '13 at 14:31
1

I saw a problem on a Gentoo server that resembles to yours at http://forums.gentoo.org/viewtopic-t-806014-start-0.html

there is a nice method there to debug the issue.

the user on that post had a problem with check_disk and got the exact same error message as yours.

he was told to execute the following command:

ssh remote_ip /usr/lib/nagios/plugins/check_disk -w 10 -c 5 -p "/"  2>&1

the 2>&1 will output stderr and might reveal the exact error.

so in your case replace remote_ip with the ip address of the server can't execute check_nrpe on. and replace the check_disk command with the full command that check_kvm is supposed to execute. if you run it without any parameters so you can just go and execute

  ssh <remote_ip> /usr/lib64/nagios/plugins/check_kvm 2>&1

that hopefully will reveal information regarding the problem.

good luck!

ufk
  • 323
  • 3
  • 7
  • 26
  • Unfortunately, i get the same outputs: `[root@Nagios-SRV ~]# ssh 1.1.1.159 /usr/lib64/nagios/plugins/check_kvm "/" 2>&1root@1.1.1.159's password: hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running [root@Nagios-SRV ~]# ssh 1.1.1.159 /usr/lib64/nagios/plugins/check_nrpe -H localhost "/" 2>&1 root@1.1.1.159's password: NRPE v2.14 [root@Nagios-SRV ~]# ssh 1.1.1.159 /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm "/" 2>&1 root@1.1.1.159's password: Unknown guests [root@Nagios-SRV ~]# ` – Itai Ganot Oct 23 '13 at 09:23
  • have you tried running other scripts like check_disk ? does this behaviour happens on every script or just this one ? – ufk Oct 23 '13 at 11:07
  • Yes everything else works and so does the `check_kvm` script while checking other remote machines. – Itai Ganot Oct 23 '13 at 21:51
  • Mine says "sorry, you must have a tty to run sudo" :) – Some Linux Nerd Sep 18 '14 at 01:25
1

I had the same issue and I manage to solve it by killing the nagios process (on the monitored machine):

ps -ef | grep nagios
kill -9 [NagiosProcessNumber]
/etc/init.d/nagios-nrpe-server start

All went fine after that.

Colt
  • 1,939
  • 6
  • 20
  • 25
user428879
  • 11
  • 1
0

Try to see if selinux was turned on on the remote server(where the nrpe agent is running). [root@dl1-ap-ldap1 plugins]# getenforce Enforcing If yes, then turn it off, or configure [root@dl1-ap-ldap1 plugins]# setenforce 0

-1

Try commenting the following line in /etc/sudoers file:

Defaults    requiretty

After modification, it should be like this:

#Defaults    requiretty
Ladadadada
  • 25,847
  • 7
  • 57
  • 90