1

I have a server on AWS that has been stable for years. It's running Apache 2.2.22 on Ubuntu 12.04.5 LTS. I regularly run security updates at the top of the month.

Recently Apache has stopped serving requests at around midnight and sometimes at later times in the early morning (2-4 AM Central). For us this is a period of low use but we do still have a steady stream of requests. Our daytime activity is much higher and it never goes down during the day (so far at least). So far, it always comes up after about 10 minutes or so and is always back to normal in the morning. Because of the time of the issue, I don't suspect it has to do with MaxClients as several other threads suggest.

I see nothing in any of the apache error logs (we have around 10 virtual hosts) nor dmesg, nor syslog. In fact, I read every log in /var/log this morning. At the time of the issue, I see nothing happening.

I have been able to catch it when it's down because of a site monitor notification. During that time, I can verify that apache is not serving requests. I can shell in to the server though. I had suspected network issues but it's odd that one port is fine and another isn't. I have been able to confirm that apache won't serve requests from localhost at that server. So, it seems to not be hardware.

Restarting the apache service does bring the sites back up but only for a few minutes. I was tailing logs live when it went down and still nothing. Not a thing shows in any log at the time of the issue. Rebooting always seems to fix it (for at least another 24 hours and sometimes more).

Amazon support sees nothing wrong with the server. I suspect either Apache is failing and the service restart isn't clean. Or, something is going on at the kernel/network level of the OS. And, again, this always happens when CPU load is low. Memory looks okay. None of the "usual suspects" are happening. It just silently stops working.

Given the lack of information I have from the server, I have no idea what else to look for. I am leaning towards rebuilding the server next week but would really like to figure this out.

Maybe there's something else I should look at during the problem. Also, if there are any logs I "turn up" to get more info, let me know. Currently I have no logs I can post that are helpful. This whole thing seems crazy to me because I'm used to failures getting logged somehow, especially if it's apache or the kernel (we've been long-time friends).

Phil
  • 11
  • 3

1 Answers1

1

I'd suggest next time you start it, start an strace running so that after it dies you can investigate what calls were happening last before it failed. You can use the following command after you start it to make sure you attach to the master process and all its children and any new ones that get forked.

pidlist=''; for pid in `ps ax | grep httpd | awk '{print $1}'`; do pidlist="$pidlist -p $pid"; done; strace -tt -F -f $pidlist 2>&1 |tee /root/apache_strace.out

I don't know if on your distro if the Apache process is called httpd or something else (like apache or apache2), but if it's not httpd, then swap the correct name into the command above.

sa289
  • 1,308
  • 2
  • 17
  • 42