One of the servers monitored by Zabbix is not reachable. I have no idea why as this works normally with other servers.
- The zabbix-agent service on the monitored server is running.
- We have several servers, all monitored by zabbix. In
/etc/zabbix/zabbix_agentd.conf
I see no difference between this problematic server and another one that works normally. - Both the zabbix server and the monitored server (agent-server) are hosted by Amazon.
- All zabbix monitored servers are linked to a security group with two inbound rules for port 10050 and 10051 for the zabbix-server IP. So incoming requests from the zabbix-server to the zabbix-agents on these servers should be allowed. They work on several servers, but not on this one.
- The zabbix-server has a different security group, and no rules set for ports 10050 and 10051, so they should be blocked. Iptables returns no rules.
- I can open a telnet session from the zabbix-server to the agent. It disconnects automatically, but it connects. So I guess the firewall is not the problem.
- Server: Amazon Linux (Centos like)
- Installed file:
http://repo.zabbix.com/zabbix/2.2/rhel/6/x86_64/zabbix-release-2.2-1.el6.noarch.rpm
- SELinux is disabled on all these agents and on the server.
Agent log after restart of zabbix-agent service
10939:20151127:093938.268 Starting Zabbix Agent [agent-server.test]. Zabbix 2.2.11 (revision 56693).
10939:20151127:093938.268 using configuration file: /etc/zabbix/zabbix_agentd.conf
10942:20151127:093938.269 agent #1 started [listener #1]
10945:20151127:093938.269 agent #4 started [active checks #1]
10941:20151127:093938.270 agent #0 started [collector]
10944:20151127:093938.270 agent #3 started [listener #3]
10943:20151127:093938.271 agent #2 started [listener #2]
10945:20151127:141742.930 active check configuration update from [zabbix-server-ip:10051] started to fail
(cannot connect to [[zabbix-server-ip]:10051]: [4] Interrupted system call)
When I telnet to the agent-server, then enter agent.version
, it returns: ZBXD2.2.11
Contents of /etc/zabbix/zabbix_server.conf
(server):
ListenPort=10051
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
DBName=zabbix
DBUser=zabbix
DBPassword=******
DBSocket=/var/lib/mysql/mysql.sock
SNMPTrapperFile=/var/log/snmptt/snmptt.log
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
Contents of /etc/zabbix/zabbix_agentd.conf
(agent)
PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
EnableRemoteCommands=1
Server=zabbix-server-ip
ListenPort=10050
StartAgents=3
# ServerActive=zabbix-server-ip # commented out
Hostname=server.test
Timeout=3
AllowRoot=1
Include=/etc/zabbix/zabbix_agentd.d/
Netstat on zabbix server
$ sudo netstat -lpn | grep zabbix
tcp 0 0 0.0.0.0:10051 0.0.0.0:* LISTEN 7624/zabbix_server
tcp 0 0 :::10051 :::* LISTEN 7624/zabbix_server
Netstat on problematic agent
$ sudo netstat -lpn | grep zabbix
tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 3248/zabbix_agentd
tcp 0 0 :::10050 :::* LISTEN 3248/zabbix_agentd
Netstat on working agent
$ sudo netstat -lpn | grep zabbix
tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 24242/zabbix_agentd
tcp 0 0 :::10050 :::* LISTEN 24242/zabbix_agentd
Active vs passive agent
- I've opened port 10051 on the server for the problematic agent IP.
- Telnet shows that works, from agent to server.
- I've activated the
ActiveServer
option with the zabbix-server-ip as value. The error mesage is gone in the log after restarting the agent. - The problem is still there...
Next try:
- I've did the same for a working agent, can telnet from agent to server.
ActiveServer
is set with the zabbix-server-ip, agent is restartedStartAgents
is set to 0, to force using the active agent.- Zabbix reports that this server is unreachable...
- Then I reset to passive.
All in all, the active mode may have been set in the agent config on several servers, it has never worked. All reports are from passive agents.
Agent Interfaces
- Opening via Monitoring > Latest data, selecting host=all, I click the server name, and choose Host Inventory
- The working agent displays its own IP address.
- The problematic agent displays the zabbix-server-ip.
I don't know why this happens, but it seems strange.
What can cause this connection problem? How can I reconnect the server with the agent?
Solution
It turns out that the IP address set in the host configuration (via the web interface) was that of the zabbix-server itself. This should of course be the address of the agent-server.