Synopsis
I have observed the very same behavior with Apache; it seems that this problem is not specific to lighttpd.
In my case, the symptoms were exactly the same; the Apache access logs were peppered with intermittent 500 response codes, and there were no corresponding entries in PHP's error log (and PHP error-reporting was configured to be maximally verbose).
I described the issue extensively on the Apache mailing list (search the list archives for the subject "Intermittent 500 responses in access.log without corresponding entries in error.log").
Root Cause
1100110's answer hints at the root cause, but I'll provide additional documentation, straight from Apache, as well as suggestions for eliminating the problem.
Here is the official word from Apache on this matter:
https://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html :
Special PHP considerations
By default, PHP FastCGI processes exit after handling 500 requests,
and they may exit after this module has already connected to the
application and sent the next request. When that occurs, an error will
be logged and 500 Internal Server Error will be returned to the
client. This PHP behavior can be disabled by setting
PHP_FCGI_MAX_REQUESTS to 0, but that can be a problem if the PHP
application leaks resources. Alternatively, PHP_FCGI_MAX_REQUESTS can
be set to a much higher value than the default to reduce the frequency
of this problem. FcgidMaxRequestsPerProcess can be set to a value less
than or equal to PHP_FCGI_MAX_REQUESTS to resolve the problem.
PHP child process management (PHP_FCGI_CHILDREN) should always be
disabled with mod_fcgid, which will only route one request at a time
to application processes it has spawned; thus, any child processes
created by PHP will not be used effectively. (Additionally, the PHP
child processes may not be terminated properly.) By default, and with
the environment variable setting PHP_FCGI_CHILDREN=0, PHP child
process management is disabled.
The popular APC opcode cache for PHP cannot share a cache between PHP
FastCGI processes unless PHP manages the child processes. Thus, the
effectiveness of the cache is limited with mod_fcgid; concurrent PHP
requests will use different opcode caches.
There we have it.
Possible Solutions
Option 1
One solution is to set PHP_FCGI_MAX_REQUESTS to zero, but taking this measure introduces the potential for memory leaks to grow out of control.
The various bits of documentation that I have consulted do not make it clear whether PHP via Fast-CGI suffers from inherent memory-leaking (hence this built-in "process recycling" behavior) or if the risk is limited to poorly-written, "runaway" scripts.
In any case, there is risk inherent to setting PHP_FCGI_MAX_REQUESTS to zero, especially in a shared hosting environment.
Option 2
A second solution, as described in the excerpt above, is to set FcgidMaxRequestsPerProcess to a value less than or equal to PHP_FCGI_MAX_REQUESTS. The documentation omits an important point, however: the value must also be greater than zero (because zero means "unlimited" or "disable the check" in this context). Given that the default value for FcgidMaxRequestsPerProcess is zero, and the default value for PHP_FCGI_MAX_REQUESTS is 500, any administrator who has not overridden these values will experience the intermittent 500 response codes. For this reason, I fail to understand why FcgidMaxRequestsPerProcess and PHP_FCGI_MAX_REQUESTS do not share the same default value. Perhaps this is because configuring these two directives as such yields the same net result as setting PHP_FCGI_MAX_REQUESTS to zero; the documentation is ambiguous in this regard.
Option 3
A third solution is to abandon Fast-CGI altogether, in favor of a comparable alternative, such as suPHP or plain-old CGI + SuExec. I have performed some basic, raw performance benchmarking across the various PHP modes, and my findings are as follows:
- Mod-PHP 77.7
- CGI 69.0
- suPHP 67.0
- Fast-CGI 55.7
Mod-PHP is the highest-performing, with a score of 77.7. The scores are arbitrary and serve only to demonstrate the relative variance in page-load-times across PHP modes.
If we assume that these benchmarks are fairly representative, then there seem to be very few reasons to cling to Fast-CGI, given this one (fairly serious) flaw in its implementation. The only substantial reason that comes to mind is op-code caching. My understanding is that PHP cannot utilize op-code caching via CGI or suPHP mode (because processes do not persist across requests).
While Fast-CGI does not take advantage of op-code caching (e.g., via APC) out-of-the-box, clever users have devised a method for rendering APC effective with Fast-CGI (via per-user caches): http://www.brandonturner.net/blog/2009/07/fastcgi_with_php_opcode_cache/ . There are several drawbacks, however:
- The memory (RAM) requirements are considerable, as there is a dedicated cache for each user. (For perspective, consider that in Mod-PHP mode, all users share a single cache.)
- Apache must use the older module, mod_fastcgi, instead of the newer equivalent, mod_fcgid. (For details, see the article cited in the paragraph above.)
- The configuration is rather complex.
As a related corollary, you said the following in your question:
First I am using APC so PHP is in control of it's own processes, not FastCGI.
Unless you're using mod_fastcgi (and not mod_fcgid), and unless you've followed steps similar to those cited a few paragraphs above, APC is consuming resources without effect. As such, you may wish to disable APC.
Summary of Solution
Take one of the following three measures:
- Set the PHP_FCGI_MAX_REQUESTS environment variable to zero. (Introduces potential for memory leaks in PHP scripts to grow out of control.)
- Set FcgidMaxRequestsPerProcess to a value less than or equal to PHP_FCGI_MAX_REQUESTS, but greater than zero.
- Abandon Fast-CGI in favor of a comparable alternative, such as suPHP or plain-old CGI + SuExec.