4

So got the below earlier to day...

Around that time the logs show a ramp in processes(600) and associated memory (1.2g), cpu usage load average (80) untill the server gave out.

Server had to be hard reset by host as there was no ssh or plesk panel access.

Fast CGI is configured as below and is setup for one high use site. As I understand it FcgidMaxProcesses 20 should protect against what happen but has not.

I've read many forums with differing answers and references to many different fcgi directives, but have found nothing conclusive. Any one got some definitive answers on how to stop this sort of server process ramping and subsequent server failure?

If you need more info let me know.

Cheers Andy

 /var/log/apache2/error_log
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17651 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17650 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17649 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17644 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17643 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17638 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17633 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17627 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:47 2012] [warn] mod_fcgid: process 17622 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17674 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17673 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17672 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17667 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17666 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17665 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17664 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17659 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17658 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17657 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17656 graceful kill fail, sending SIGKILL
[Thu May 17 07:40:51 2012] [warn] mod_fcgid: process 17651 graceful kill fail, sending SIGKILL

https://docs.google.com/a/thesugarrefinery.com/open?id=0B_XbpWChge0VRmFLWEZfR2VBb2M https://docs.google.com/a/thesugarrefinery.com/open?id=0B_XbpWChge0VWTcwZEhoV2Fqejg https://docs.google.com/a/thesugarrefinery.com/open?id=0B_XbpWChge0VUUtVWWFINHZjZ0U https://docs.google.com/a/thesugarrefinery.com/open?id=0B_XbpWChge0VZEVMclh6ZUdaOUE

<IfModule mod_fcgid.c>

<IfModule !mod_fastcgi.c>
    AddHandler fcgid-script fcg fcgi fpl
</IfModule>
  FcgidIPCDir /var/lib/apache2/fcgid/sock
  FcgidProcessTableFile /var/lib/apache2/fcgid/shm

  FcgidIdleTimeout 40
  FcgidProcessLifeTime 30
  FcgidMaxProcesses 20
  FcgidMaxProcessesPerClass 20
  FcgidMinProcessesPerClass 0
  FcgidConnectTimeout 30
  FcgidIOTimeout 120
  FcgidInitialEnv RAILS_ENV production
  FcgidIdleScanInterval 10
  FcgidMaxRequestLen 1073741824
</IfModule>
growse
  • 7,830
  • 11
  • 72
  • 114
Andy Main
  • 41
  • 1
  • 3
  • are there any heavy IO load on server? Or something else, that can result in unresponsive fcgi process? You can try to tune FcgidErrorScanInterval parameter to make processes be killed faster. – DukeLion May 17 '12 at 12:51
  • No heavy IO load that I can account for, Backups ran as per normal at 3:30am and any user load was non existant as it all generated by office staff who werent in yet. – Andy Main May 17 '12 at 13:06

2 Answers2

1

There was a bug in Debian (at least) that rendered the limit useless with virtual hosts. It is fixed now.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=615814

Imp
  • 11
  • 1
  • Thanks for that unfortinatly this isnt it as FcgidMaxProcessesPerClass is not defined in any VirtualHost blocks. – Andy Main May 23 '12 at 09:07
0

This is often caused by a setuid CGI script that hangs; it exceeds the IOtimeout, and apache tries to kill it, but is unable because of the change in uid, resulting in the error.

You may want to increase the FcgidIOTimeout or FcgidProcessLifetime to allow the thread more time to complete.

Another workaround is to make the Apache server run under the same UID that the setuid script is chaning to. This allows it to kill the process, though it may not be advisable for security reasons. Similarly, running apache as root is also a workaround but not very secure. If you do this, note that your fcgi sock directory (under /var/lib/apache2/fcgid/sock or similar) and process table file need to be writeable by the apache process owner.

The root cause, though, is the CGI script itself taking too long. The cause for that depends on the CGI code which I have no visibiilty of.

Steve Shipway
  • 742
  • 5
  • 17