4

One of the servers monitored by Zabbix is not reachable. I have no idea why as this works normally with other servers.

  • The zabbix-agent service on the monitored server is running.
  • We have several servers, all monitored by zabbix. In /etc/zabbix/zabbix_agentd.conf I see no difference between this problematic server and another one that works normally.
  • Both the zabbix server and the monitored server (agent-server) are hosted by Amazon.
  • All zabbix monitored servers are linked to a security group with two inbound rules for port 10050 and 10051 for the zabbix-server IP. So incoming requests from the zabbix-server to the zabbix-agents on these servers should be allowed. They work on several servers, but not on this one.
  • The zabbix-server has a different security group, and no rules set for ports 10050 and 10051, so they should be blocked. Iptables returns no rules.
  • I can open a telnet session from the zabbix-server to the agent. It disconnects automatically, but it connects. So I guess the firewall is not the problem.
  • Server: Amazon Linux (Centos like)
  • Installed file: http://repo.zabbix.com/zabbix/2.2/rhel/6/x86_64/zabbix-release-2.2-1.el6.noarch.rpm
  • SELinux is disabled on all these agents and on the server.

Agent log after restart of zabbix-agent service

 10939:20151127:093938.268 Starting Zabbix Agent [agent-server.test]. Zabbix 2.2.11 (revision 56693).
 10939:20151127:093938.268 using configuration file: /etc/zabbix/zabbix_agentd.conf
 10942:20151127:093938.269 agent #1 started [listener #1]
 10945:20151127:093938.269 agent #4 started [active checks #1]
 10941:20151127:093938.270 agent #0 started [collector]
 10944:20151127:093938.270 agent #3 started [listener #3]
 10943:20151127:093938.271 agent #2 started [listener #2]
 10945:20151127:141742.930 active check configuration update from [zabbix-server-ip:10051] started to fail 
 (cannot connect to [[zabbix-server-ip]:10051]: [4] Interrupted system call)

When I telnet to the agent-server, then enter agent.version, it returns: ZBXD2.2.11

Contents of /etc/zabbix/zabbix_server.conf (server):

ListenPort=10051
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
DBName=zabbix
DBUser=zabbix
DBPassword=******
DBSocket=/var/lib/mysql/mysql.sock
SNMPTrapperFile=/var/log/snmptt/snmptt.log
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts

Contents of /etc/zabbix/zabbix_agentd.conf (agent)

PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
EnableRemoteCommands=1
Server=zabbix-server-ip
ListenPort=10050
StartAgents=3
# ServerActive=zabbix-server-ip # commented out
Hostname=server.test
Timeout=3
AllowRoot=1
Include=/etc/zabbix/zabbix_agentd.d/

Netstat on zabbix server

$ sudo netstat -lpn | grep zabbix
tcp        0      0 0.0.0.0:10051               0.0.0.0:*                   LISTEN      7624/zabbix_server  
tcp        0      0 :::10051                    :::*                        LISTEN      7624/zabbix_server

Netstat on problematic agent

$ sudo netstat -lpn | grep zabbix
tcp        0      0 0.0.0.0:10050               0.0.0.0:*                   LISTEN      3248/zabbix_agentd  
tcp        0      0 :::10050                    :::*                        LISTEN      3248/zabbix_agentd 

Netstat on working agent

$ sudo netstat -lpn | grep zabbix
tcp        0      0 0.0.0.0:10050               0.0.0.0:*                   LISTEN      24242/zabbix_agentd 
tcp        0      0 :::10050                    :::*                        LISTEN      24242/zabbix_agentd

Active vs passive agent

  • I've opened port 10051 on the server for the problematic agent IP.
  • Telnet shows that works, from agent to server.
  • I've activated the ActiveServer option with the zabbix-server-ip as value. The error mesage is gone in the log after restarting the agent.
  • The problem is still there...

Next try:

  • I've did the same for a working agent, can telnet from agent to server.
  • ActiveServer is set with the zabbix-server-ip, agent is restarted
  • StartAgents is set to 0, to force using the active agent.
  • Zabbix reports that this server is unreachable...
  • Then I reset to passive.

All in all, the active mode may have been set in the agent config on several servers, it has never worked. All reports are from passive agents.

Agent Interfaces

  • Opening via Monitoring > Latest data, selecting host=all, I click the server name, and choose Host Inventory
  • The working agent displays its own IP address.
  • The problematic agent displays the zabbix-server-ip.

I don't know why this happens, but it seems strange.

What can cause this connection problem? How can I reconnect the server with the agent?

Solution

It turns out that the IP address set in the host configuration (via the web interface) was that of the zabbix-server itself. This should of course be the address of the agent-server.

SPRBRN
  • 561
  • 4
  • 12
  • 27
  • 1
    To check from zabbix-server do `telnet client_dns_or_ip 10050` and then ask for `agent.version`, and check the difference on working and non-working machine - probably it won't connect at all to broken one. – Dmitry Verkhoturov Nov 27 '15 at 16:26
  • 1
    And check `sudo service zabbix-agent restart` output, zabbix-agent logs. – Dmitry Verkhoturov Nov 27 '15 at 16:28
  • 1
    I've added the agent log to the question. When I telnet to the agent-server, then enter `agent.version`, it returns: ZBXD2.2.11. So I get a reply. – SPRBRN Nov 27 '15 at 16:39
  • Something strange so, your zabbix server does the same if it's connecting by DNS\IP by port 10050 to zabbix agent — recheck your interfaces. – Dmitry Verkhoturov Nov 28 '15 at 01:28
  • 1
    https://www.zabbix.org/wiki/Troubleshooting#Zabbix_says_.22Agent_unreachable.22.2C_but_the_host_is_up._How_do_I_debug_that.3F is a classic reference for debugging such problems. – asaveljevs Nov 30 '15 at 12:51
  • 1
    What's the os on the server? Where did you download the package from? Full configuration file would be useful too. – stoned Nov 30 '15 at 13:48
  • Thanks @stoned! I should have included that right away. See the update. – SPRBRN Nov 30 '15 at 14:03
  • 1
    @SPRBRN, I think the agent type is wrong. It should be passive. Can you chek it Plesse ? Disable/Remove the `ServerActive` option in agent config file and set `server`to your zabbix server address. – Diamond Nov 30 '15 at 15:20
  • 1
    What's the output of `sudo netstat -lpn`? – Jonathan S. Fisher Nov 30 '15 at 15:42
  • Another server has the same setting, and it works there. Disabling the ServerActive line does not make a difference. – SPRBRN Nov 30 '15 at 15:54
  • @exabrial - see my updated answer. I've included the output for a working agent on another server, the only difference being the PID. – SPRBRN Nov 30 '15 at 16:00
  • 1
    @SPRBRN, if your agents are configured for active checkes then you need port 10051 to be open at zabbix server end so the agent can connect to it. And seems the agent is failing to connect to that port at server. – Diamond Nov 30 '15 at 16:24
  • Servers are hosted with Amazon AWS. They use security groups, in effect the firewall. The server itself has iptables, with 10050 and 10051 open for all addresses. The agent has no rules set. This server as well as another zabbix-agent-server which works, are in the same group. Telnet from both agents to the server don't give a response. – SPRBRN Nov 30 '15 at 16:35
  • @SPRBRN, have you done a telnet on port `10051` (not port 10050) from both the clients? I'm very curious to see the output. – Diamond Nov 30 '15 at 18:11
  • @SPRBRN, see my updated answer pls. – Diamond Nov 30 '15 at 18:15
  • Problem is solved! If you want some points for the effort, copy my answer (below the question) and post it as a new answer. First post will get the accepted answer. – SPRBRN Dec 01 '15 at 15:01

3 Answers3

2

How about current setting of SELinux and iptables on agent box? Can you from agent telnet to server via port 10051?

You can try to check the connectivity between boxes using tcpdump on agent: tcpdump -i your_interface tcp port 10050. Using this you can see the incoming/outgoing packets.

cuongnv23
  • 230
  • 3
  • 9
  • I cannot telnet from agent to server. No response. I have more servers running zabbix agents, working well, and they cannot telnet to the server as well, so I guess this is not the issue. Tcpdump may give a clue. I get output on a good agent-server, but none on this problematic server. – SPRBRN Nov 30 '15 at 15:53
  • 1
    but your log said that `10945:20151127:141742.930 active check configuration update from [12.34.56.78:10051] started to fail (cannot connect to [[12.34.56.78]:10051]: [4] Interrupted system call)` it means your agent tries to connect to server. what kind of deployment you have? active or passive? – cuongnv23 Nov 30 '15 at 15:57
  • I think active, but I'm not sure. Another server has `ActiveServer` set with the zabbix-server-ip as value, and that agent sends info. It might be configure wrongly as well - no idea. Is this the only setting for this? I have set the `Server` setting as well, with the same IP address. – SPRBRN Nov 30 '15 at 16:19
  • 1
    `netstat -tnpl` please. Check if the port has been binded. Then also `ausearch -ts today -sv no` which will tell you of SElinux failures. There was a problem with some SELinux rules at some point in Centos which blocked the agent from binding to the port. In case you find errors there, try setting SElinux to permissive mode with `setenforce 0` and restart the service. Does it work? – stoned Dec 01 '15 at 12:47
  • SELinux is disabled on all servers. Netstat shows that the agents have port 10050 binded, the server has 10051 binded. I've updated the answer, see the part about active vs passive agents, and about Agent interfaces. – SPRBRN Dec 01 '15 at 13:03
2

I think, you need to understand the active and passive mode of connection for zabbix to resolve the problem. Here from zabbix documentation:

Passive and active checks

Zabbix agents can perform passive and active checks.

In a passive check the agent responds to a data request. Zabbix server (or proxy) asks for data, for example, CPU load, and Zabbix agent sends back the result.

Active checks require more complex processing. The agent must first retrieve a list of items from Zabbix server for independent processing. Then it will periodically send new values to the server.

Now the active mode to work, you need to have port 10051 open at Zabbix server, so that the agents from clients can connect to it. From the error what you are getting, this is the problem:

10945:20151127:141742.930 active check configuration update from [zabbix-server-ip:10051] started to fail (cannot connect to [[zabbix-server-ip]:10051]: [4] Interrupted system call)

The tests you have done is about connection from the Zabbix server to the client and it seems working without any problem. But thats not enough for the active mode to function. The connection from the client agent to the server on port 10051 is not working in your case and you need to focus on that.

The information you have provided is misleading:

The zabbix-server has a different security group, and no rules set for ports 10050 and 10051, so they should be blocked. Iptables returns no rules.

The above about the port can not be true, as you are using active mode. The server must have port 10051 open for the clients to connect or you have to use passive mode.

So, please check the nececessary firewall rules in between and make sure the client/agent can reach the server on this port. I am sure the other agent ( on the other working server), can reach the Zabbix server on port 10051.

Diamond
  • 8,791
  • 3
  • 22
  • 37
  • I've updated the question. See the part about active vs passive agents, and about Agent interfaces. – SPRBRN Dec 01 '15 at 12:51
  • 1
    The monitoring process has two parts, agent and server and you need to configure both and they should match. If you are not using "Active agent auto-registration", then you have to create/edit the host in Zabbix server accordingly. See the zabbix doc on how to add a host: https://www.zabbix.com/documentation/2.0/manual/config/hosts/host – Diamond Dec 01 '15 at 13:52
  • For example, here is one with wrong host configuration at server: http://serverfault.com/questions/475590/configure-active-zabbix-agent – Diamond Dec 01 '15 at 13:55
  • Problem solved! It's a stupid misconfiguration. In the zabbix host configuration for the server (web admin interface), the Agent Interfaces had one IP address, and that was the address of the zabbix-server, not the ip address of the agent. Setting that to the agent IP address fixed this problem. Although you didn't give the solution, you get the bonus points for the effort and keeping me going. – SPRBRN Dec 01 '15 at 14:27
  • 1
    Well, I couldn't, because your post was about agent not running and I pointed out to the cause and I have also pointed out that the rest can only depend on the server side. Glad that your problem is resovled finally, happy monitroring! – Diamond Dec 01 '15 at 14:38
1

It turns out that the IP address set in the host configuration (via the web interface) was that of the zabbix-server itself. This should of course be the address of the agent-server.

SPRBRN
  • 561
  • 4
  • 12
  • 27