0

I have a machine running Ubuntu Server that experienced severe problems today, for no apparent reason. Two of the services that it's running are apache2 and ssh, and during the period the server was slow I wasn't able to access either.

I've been checking the logs in /var/logs/. This hasn't shed any light into the matter, but then again, I'm not sure what I'm looking for...

How can I diagnose the problem, so that I can take measures to prevent it from happening again in the future?

The full story/details:

  • Today during a class I've given out an exercise (sort of an exam) to around 35 computer-science students. They were supposed to access two instances of Trac that I had previously installed in my server. Each student had their own login credentials.
  • The server is actually a VMWare virtual machine with Ubuntu 11.10, and lives in the same network from which the students were accessing it.
  • When the exam started, the students entered an address that they were given into their web-browsers. Three of them actually managed to see trac's first webpage, but after that, the server became totally unresponsive (the browsers just kept waiting until they timed out)
  • I've also tried accessing a server console, via SSH and via the VMWare VSphere Client, but on both cases the console was also totally unresponsive.
  • I wasn't sure what else to try, so I've reset the virtual machine. It booted, but nothing changed after that: all the services I've mentioned above remained unresponsive.
  • I've booted it again --- nothing new.
  • At this point I sent everyone home, as we no longer had enough time for the exam. When about half of them had the laptop shutdown, the server started responding again. I don't think this was a coincidence, but still can't explain what the server problems were exactly and how to prevent them.

Update

The hardware assigned to this particular VM is:

  • 1 CPU
  • 512MB Memory
  • 60GB Hard Disk space (with currently 80% free)
Filipe Correia
  • 243
  • 1
  • 4
  • 9
  • I'm guessing httpd was running in prefork mode which is very memory-heavy and you started to swap to disk. What kind of CPU & RAM resources did you allocate to the VM? – Jason Litka Mar 27 '13 at 19:30
  • @JasonLitka I've updated the question with those details. The memory is not much, but this is basically all that the server is running, so I expected it to be enough. Do you have any suggestion (any specific log that I can check?) of how I can confirm if the unresponsiveness was due to thrashing like you suggest? – Filipe Correia Mar 28 '13 at 16:36
  • I was able to partially reproduce the issue. I've open 10 file downloads from this server in my web browser, and it was enough for some of the connections to remain pending (i.e., the browsers just kept trying to get a response from the server). When one of the file downloads finished, one of the pending connections would finally get its answer and start the download. What I mean by *partially* being able to reproduce it is that I could still SSH to the server, so it wasn't as severe as yesterday. – Filipe Correia Mar 28 '13 at 19:53

1 Answers1

2

Based on your update, 512MB of RAM is NOWHERE near enough for 35 simultaneous students if you've done no out of the box tuning (and it may not work out even tuned). If you have the ability to do so, raise the memory limit to 2GB. That should provide a safe pad so that the box doesn't swap so heavily that it doesn't respond.

Beyond that, you can start by disabling keep-alives in your httpd.conf if they are turned on so that connections aren't held open. Next would be to disable any httpd modules that you are not using to try and minimize memory usage per process. Third, assuming you weren't able to increase the memory limit, change MaxClients in your httpd.conf to 8 (if you were able to increase the server RAM, try RAM/64 as a value). Fourth, though this may seem counter-intuitive since you are short on memory, install APC for PHP opcode caching. You'll sacrifice a small amount of RAM for the cache, but connections may be served a bit quicker, freeing them up for other people.

Long run, look into switching httpd to worker/event mode and running PHP as FastCGI, or switching to a lighter weight web server like nginx.

Jason Litka
  • 148
  • 1
  • 3
  • Thanks @JasonLitka. I guess I was being a little too optimistic as to how far 512MB would take me... I was able to update the server's memory to 2GB, which I hope will give me some room for maneuver. I've tried running a couple of benchmarks using ``ab`` (Apache's benchmarking tool) and performance as improved for sure. – Filipe Correia Mar 29 '13 at 22:22
  • Something still bugs me though. Your intuition told you right away that this was a memory issue. But, returning to my initial question, how could I diagnose this myself? E.g., is there some kind of logging I can activate to confirm it was really a memory issue if the "symptoms" return? – Filipe Correia Mar 29 '13 at 22:22
  • 1
    It's a simple matter of math and experience (as in, more than a decade). httpd out of the box runs in prefork mode which will burn ~20MB/conn. Adding in mod_php will toss in another 20-30MB/conn. MySQL (or another DBMS) will eat even more RAM. If you are OOM-ing enough to also run out of swap you might find messages in /var/log/messages, but short of that, a system becoming unresponsive is a symptom of heavily hitting swap. This is part of the reason for running alternative web servers or httpd in worker/event mode (memory per conn can drop dramatically at the expense of complexity). – Jason Litka Apr 01 '13 at 00:46