5

My apache is serving about 300 request/sec (2 megabytes/s) constantly with server load of 0.05.

The problem is, that my service architecture causes to get huge traffic at specific moment (like 300-500 people is redirected to some page with JavaScript in several seconds).

After such short traffic jump, apache becomes unresponsive (connection reset after about 30 seconds in firefox) without logging anything. Apache is freezed until apache2 restart procedure.

When freezed, it cannot serve even simple HTML file without PHP or SQL connection (but apache2 processes exists)

I tried different prefork settings from 50 to almost 1000 idle workers and max clients limits of 10000, but nothing helps.

Another symptome apart from not logging anything, is that moments before freeze, apache status module shows (that last time before it gets unresponsive also) that almost every process wait for connection:

__R_R_______R__RR______R___R________________RR_______R______R___
_________R__________R_________________________R________CR___R___
___________R__________________________C__WR__R________________R_

But in normal, less-laoded work it shows:

C___R___K_C___C___C_____KK______R___C_C_R______C__K___C________K
____C__KR_RR__C___K___KK_C__R__K__C_CK__RC___CR___R__K__C__R____
___KR____C_____R______R______K__R_______KC__C_K__R____C_______R_

syslog also gives nothing. My machine has 64GB RAM and never exceeds load of 0.1

Piotr Müller
  • 113
  • 1
  • 4
  • 9
  • i wonder if it could be something unexpected like bad memory – user16081-JoeT Feb 01 '13 at 23:22
  • I thik, that this is too regular for bad memory - nothing else breaks on this machine, and this freeze never appears in other conditions. – Piotr Müller Feb 04 '13 at 10:14
  • Have you checked file descriptors? Try using something like `sar` to monitor your machine and, I would advise you to test your memory, as suggested by @user16081 – fboaventura Feb 04 '13 at 10:42
  • I've seen somewhat similar behaviour on an Apache 2 instance acting as a frontend for a tomcat application. When the backend-app was unresponsive the httpd would hang while it waited for AJP connections to finish. While PHP bears little relation to this it might be worth considering that the Webapp is "hanging" here; not Apache directly. Maybe more debugging in the webapp is required? It might also be that the DB is the bottleneck and the webapp is waiting on the DB. As mentioned above - also check syslog to see if any system resources are running out. – Friedrich 'Fred' Clausen Feb 04 '13 at 11:34
  • (1) When problem occur, can you check `top` and memory usage? (2) Any error in `/var/log/apache2/error.log` or is that one empty too? – John Siu Feb 05 '13 at 15:09
  • error log empty, access log ordinary queries like alwatys. About PHP or other extension hung - remebrer that whole server freezes and cannot serve even simple .html page. It looks like it accepts connection but not responding or reseting after 30 sec. It's not "connection refused" error – Piotr Müller Feb 06 '13 at 13:36
  • I had this with several Apache installs and after a long time debugging I solved it by switching to nginx... Didn't like the solution but you gotta do what you gotta do :/ – Antoine Benkemoun Feb 06 '13 at 14:00

5 Answers5

3

I think that when your connections spike at more than 450 per second it may relate to the fact that you're running out of ephemeral ports in Linux.

Check out this previously answered question

Small abstract from the answer:


sysctl net.ipv4.ip_local_port_range
sysctl net.ipv4.tcp_fin_timeout

The ephermal port range defines the maximum number of outbound sockets a host can create from a particular I.P. address. The fin_timeout defines the minimum time these sockets will stay in TIME_WAIT state (unusable after being used once). Usual system defaults are:

net.ipv4.ip_local_port_range = 32768 61000
net.ipv4.tcp_fin_timeout = 60 

This basically means your system cannot guarantee more than (61000 - 32768) / 60 = 470 sockets at any given time. If you are not happy with that, you could begin with increasing the port_range. Setting the range to 15000 61000 is pretty common these days. You could further increase the availability by decreasing the fin_timeout. Suppose you do both, you should see over 1500 outbound connections, more readily.

Martino Dino
  • 1,145
  • 1
  • 10
  • 17
1

Can you attach to the running unresponsive process and see what happens? Might be easier if you run prefork.

Attaching to the process using trace

strace -p <pid> -o /tmp/somefile

You might want to play with -s

-s strsize Specify the maximum string size to print (the default is 32). Note that filenames are not considered strings and are always printed in full.

3molo
  • 4,340
  • 5
  • 30
  • 46
1

I agree with 3molo, strace can give you a hint of what is going on i.e. if there are system calls that are hanging. The one thing I haven't found strace to be helpful with are slow io issues. Running

sudo iotop

and

sudo top

Can give a bit of insight as to what sort of IO activity is taking place. Slow IO has caused similar behavior for me, in the past; such as having to read many very small files from a slow NAS. If top reports a high 'wait' and iotop shows a high percentage of bandwidth, you may need to apply a different storage solution.

Stephan
  • 999
  • 7
  • 11
0

You need to start with two things.

1) Set loglevel to debug in apache configuration. Whenever you have the problematic behavior, take a look at both access logs and error logs.

Warning: This might fill-up your disk quickly. So switchback from debug to its original value once you have sufficient information.

2) While I agree for strace option suggested here, I would recommend you to do gdb on running process. If you want more help about how to debug a running process, I'd recommend you to see this.

Nehal Dattani
  • 581
  • 2
  • 10
0

Sounds a lot like a file descriptor limit. You need to su to the user that apache runs as and then run this:

ulimit -n

The default setting on a lot of distros seems to be 1024. If so, try cranking that way up. You can change it in /etc/security/limits.conf on debian-based distros. Say the user apache runs as is apache, then you could add this:

apache soft nofile 65535
apache hard nofile 65535

You'll need to reboot to apply this change.

chrskly
  • 1,539
  • 11
  • 16