3

I requested an hard reboot because none of ssh and http worked. Ping worked normally.

Which logs should i check to understand what was the problem?

Thanks! (debian 6 on lamp)

Edit: my memory and swap:

Mem:   4040068k total,  1114920k used,  2925148k free,   109212k buffers
Swap:  1051384k total,        0k used,  1051384k free,   283820k cached

4 GB ram

(and more than 1TB of HDD)

The cause is from 2 days ago:

look how the usage of swap goes +60% in less than 10hours

My control panel reports this as top 5 memory usage process:

If every apache2 process is 190MB large that sux because IF i do TOP i have 262 sleeping process most of them are apache2!

My apache mpm_prefork settings are:

<IfModule mpm_prefork_module>
    StartServers          5
    MinSpareServers       5
    MaxSpareServers      10
    ServerLimit      1500
    MaxClients            1500
    MaxRequestsPerChild   2000
</IfModule>

KeepAlive On


MaxKeepAliveRequests 100


KeepAliveTimeout 4
dynamic
  • 730
  • 6
  • 17
  • 31
  • 3
    Change your apache configuration to not start more processes than you have RAM for. – mattdm Mar 08 '11 at 15:41
  • @mattdm: i posted my apache2 config, any suggestions? Maybe i should lower **MaxRequestsPerChild** to avoid apache2 memory leaks? – dynamic Mar 08 '11 at 15:56
  • 1
    Do the math here. If you have 1500 servers, even if they're a svelte 10MB each, that's 15GB of RAM. If 190MB is a typical size for an apache process with your workload (php?), and you have 4GB of ram, ServerLimit should be no more than _21_. – mattdm Mar 08 '11 at 16:17
  • 3
    Also, I suspect that you don't have a memory leak, but rather a PHP script which is able to consume a lot of RAM. Apache processes will expand to the maximum memory used by PHP, and _not go back down_. So setting MaxRequestsPerChild ridiculously low will mitigate this, but the solution is to either a) reign in your PHP app's memory consumption or b) switch to a different server architecture (using fastcgi for php). – mattdm Mar 08 '11 at 16:22
  • @mattdm: Aren't resources freed by PHP when the scripts are done? – dynamic Mar 08 '11 at 16:27
  • @yes123: In short, no. See http://virtualthreads.blogspot.com/2006/02/understanding-memory-usage-on-linux.html. But, even if it were freed, you're setting yourself up for a crash when load gets high. – mattdm Mar 08 '11 at 16:30
  • @mattdm: i started loggin memory usage with memory_get_peak_usage(); at the moment every scripts are using less than 3-4mb. But anyway i don't think this is the problem because on my previous server i never had this problem (except problems with apache2 settings) – dynamic Mar 08 '11 at 16:40
  • Related Question: http://serverfault.com/questions/244750 – voretaq7 Mar 08 '11 at 17:10

1 Answers1

5

Which logs should i check to understand what was the problem?

All of them. ping working just means enough of the IP stack is up to process ICMP Echo requests (that's not a huge portion of the system compared to what's required for SSH and web servers). You could have had what I call a "partial panic" (Kernel blew up, but the IP code kept running), run out of RAM, or your SSH/HTTPd processes could have fallen over for unspecified reasons.

/var/log/messages is probably a good starting point, as is the log for your web server (presumably Apache). If nothing else it will give you an idea of when the system last worked and how long it was in the brain-dead state before it got rebooted...


Update based on comment

Sounds like something has a memory leak.
When you ran out of swap userland blew up but the kernel (being wired in RAM) could keep running & answering ping requests.

For a permanent resolution you should monitor your swap utilization carefully and when you notice it trending dangerously upward (>33% used is my threshold) hunt down the process with the most swap used: That's probably your culprit.

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • thanks for the reply, in that log i found `Mar 8 15:40:20 ns354729 kernel: Free swap = 0kB` before the down I could see (in control panel of my hosting company) the values of CPU, RAM and SWAP where totally out of control with 5% of free ram and only 1-2% of SWAP. – dynamic Mar 08 '11 at 15:20
  • But now the problem is: How can I know what caused all this problem – dynamic Mar 08 '11 at 15:23
  • 1
    @yes123 - See the update to the answer :-) You could also have an undersized box (too little RAM/too little swap for what you're trying to do), but a memory leak is what I'd investigate first. – voretaq7 Mar 08 '11 at 15:25
  • @voretaq: i added some information. As you can see my swap grew in few hours by +70% – dynamic Mar 08 '11 at 15:34
  • @yes123 that's definitely a memory leak (or a bunch of processes being kicked off and chewing through your swap). To find the cause you have to watch it happening. Also note that "top memory users" can be a somewhat misleading measurement (you want "top swappers" -- If you run top it's the guys with the biggest SWAP column -- reorder `top`'s output to see that). – voretaq7 Mar 08 '11 at 15:41
  • moving on the other question – dynamic Mar 08 '11 at 17:20