1

Our zimbra server is experiencing unexplained slowdowns every couple of days that are only resolved after rebooting the server. From the end user's perspective, if they are using webmail and they send a message, then it will eventually timeout. From the system terminal, there are slowdowns logging in, switching users, and restarting the zimbra services. It takes up to 2 minutes to change a user using 'su -'

Restarting all the zimbra services, dns services, does not resolve the problem. The problem is only resolved after completely rebooting. After rebooting, logging in, switching users, and restarting servers happen quickly.

We are using dnsmasq for split DNS which is needed for our environment because of NAT. But querying DNS returns results immediately. We are using an external ldap database for authentication but no other servers using it show any problems and there are no load problems on it either. Everything else is a default install and configuration.

There are no obvious errors in the system logs. The server load, disk IO, is the same when there is a problem and when there is no problem.

Originally this was happening once a week usually on mondays, or tuesday. This week, it happened on Monday, and Thursday.

My version is:

zimbra@servername ~ $ zmcontrol -v Release 7.2.1_GA_2790.RHEL6_64_20120815212147 UNKNOWN_64 FOSS edition.

Has anyone encountered or solved such a problem ?

garg
  • 635
  • 1
  • 7
  • 17
  • What is your logging setup like? Are you logging to a remote host, say, via TCP with rsyslog? Are actions that generate logs, such as logging in or using su, the only slow actions? – hwilbanks Dec 17 '12 at 16:47
  • How much ram does this machine have? In my experience even a small Zimbra installation needs between 4GB and 8GB to run well. – 3dinfluence Dec 18 '12 at 04:26
  • Yes, rsyslog is being used. logs are being sent to a remote log host. The only noticeable slowdowns using su and logging in, and restarting the zimbra services are slower than usual. – garg Dec 18 '12 at 19:30
  • 3dinfluence > It has 6GB of RAM and it's not a very heavily used mail server. – garg Dec 18 '12 at 19:31
  • 1
    What happens when you restart rsyslog when it slows down? If you're experiencing what I ran into, (occasionally) when the remote log host rotated its logs the rsyslog on the Zimbra server would get backed up and wouldn't be able to clear itself out. Actions creating log entries would have to wait for the logger to time out. Restarting rsyslog was a temporary fix. Be advised that depending on your rsyslog configuration the Zimbra host will probably send a lot of logs all at once to your remote syslog host. – hwilbanks Dec 18 '12 at 20:55
  • Thanks for that suggestion! I'll try that as soon as it slows down again. That is something I hadn't tried. – garg Dec 20 '12 at 13:40
  • hwilbanks > I think that worked! I experienced a slowdown, and service rsyslog restart cleared it right up! Please post that as your answer and I'll accept it! – garg Dec 20 '12 at 15:04

1 Answers1

1

I've found that rsyslog, when forwarding logs via TCP to a remote host, will sometimes get hung up when it can't forward to the remote host. Even when the remote host comes back up, rsyslog remains hung and as a result slows down everything else on the system that tries to log. Restarting rsyslog does the trick when it happens, but restarting it regularly via a cron job never seemed to work for me. The best solution I found is to not have the remote host go down so much. :)

However, there are tweaks that can be made to rsyslog so that it queues rather than locking up. You might still experience the issue, and in that case no logs will be forwarded until rsyslog is restarted, but it will not affect the system as a whole.

Comment out your current forwarding rule, and drop this at the end of your rsyslog.conf:

$WorkDirectory /var/spool/rsyslog # where to place spool files
$MainMsgQueueFileName mainqueue # unique name prefix for spool files
$MainMsgQueueMaxDiskSpace 2g   # 1gb space limit (use as much as possible)
$MainMsgQueueSaveOnShutdown on # save messages to disk on shutdown
$MainMsgQueueType LinkedList   # run asynchronously
$MainMsgResumeRetryCount -1    # infinite retries if host is down
*.* @@1.2.3.4:514 # replace this with your own forwarding rule

You will need to make sure /var/spool/rsyslog exists because it will not create it otherwise.

hwilbanks
  • 466
  • 2
  • 4