1

I've installed Munin and Monit on one of my servers running CentOS 5. Everything is working well, logging and reporting info, except for when the httpd process is restarted. I have Monit set to restart httpd if it hits 2.5gb of memory usage. If/when this happens, it'll restart just fine, but Monit won't pick up the new process.

I'll get a notice telling me that httpd service does not exist, and then another telling me httpd failed to start, and then a final one saying that the httpd service timed out and won't be monitored anymore.

I'm not sure why I'm getting these reports, because the httpd service IS getting restarted just fine. I've checked the logs and theres no issues there on the restart.

Alex Jillard
  • 93
  • 1
  • 10

3 Answers3

1

May be a race condition problem. Monit 's restart spawns "httpd stop" and then "httpd start". It is not an atomic operation. Interleaving of instructions between "httpd stop" and "httpd start" occurs.

"rm -rf pid file" may be done at last. Monit will not pick up httpd process.

pusit
  • 11
  • 1
1

Perhaps have monit run a script that restarts httpd, waits a few seconds, and then restarts monit as well.

It may be that monit is somehow locked on to the particular process ID's associated with the killed httpd processes - and this would allow it to detect the new processes correctly.

I'm not sure how much free memory your system has when it hits the 2.5Gb usage point, but if that amount gets too low (perhaps during the restart?), linux will start randomly killing processes to avoid a total crash. I'm guessing that oomkiller might be killing something essential to monit's functionality.

If this is the case, lowering your restart threshold from 2.5Gb to 2.0Gb, or increasing the amount of memory in the box would be a better solution.

Brent
  • 22,219
  • 19
  • 68
  • 102
  • Thats a good idea...I'll have to give that a try. The server has 8gb of ram total, so there's plenty to go around. – Alex Jillard Nov 24 '09 at 14:00
  • This answer wasn't exactly correct, but it got me thinking about the right stuff. It seems the actual problem was that monit didnt have permissions to restart apache...even though it did restart it. Sometimes the PID wasn't getting generated, which explains why monit never picked it up. I changed the group for the check to apache, and now it seems to have the permissions it needs. – Alex Jillard Nov 25 '09 at 16:52
  • Looks like I spoke a little too soon. Changing the group to apache only worked a couple times for whatever reason. Changing the action from restart to 'exec /etc/init.d/httpd restart' has worked though. – Alex Jillard Nov 25 '09 at 18:27
  • Depending on how busy your apache processes are, sometimes apache can take a long time to die, and the restart process will timeout. You might have more success with an httpd stop; loop until no more apache or optional time limit; if using optional time limit, kill remaining apache processes; httpd start – Brent Nov 26 '09 at 17:06
0

This would be because Monit is balls. It has always had problems detecting the actual state of a service. I'm not sure why, but I gave up on monit some time ago and switched to alternate means of doing the things that monit tried to do, with great success and more than a little happiness.

womble
  • 95,029
  • 29
  • 173
  • 228
  • What would you suggest? What have you been using? My current setup is Nagios, monit and munin. – Alex Jillard Nov 23 '09 at 14:22
  • daemontools for process management, and Nagios (with event handlers) for the bigger things like memory consumption cleanup. – womble Nov 23 '09 at 14:42