1

Sorry if this seems rambled. I have been trying to troubleshoot / solve this for the past 20 hours straight, while having to babysit the server and restart php5-fpm every time it goes down.

I imagine there is an underlying problem with a plugin (high traffic WordPress site) or something that is hanging that thread, possibly causing a dependency lock which is tying up the other threads. My immediate concern is finding way to make the emergency_restart_threshold to auto restart the php5-fpm when its needed. Then I can sleep, and find the underlying PHP issue. The problem has happened 21 times in the last 24 hours. Sometimes as much as 3.5 hours apart, other times only a couple minutes.

I have experimented with various timing values, but so far nothing seems to trigger the needed SIGSEGV or SIGBUS signal. It looks like once the issue starts, all the PHP requests hang, like they are all waiting on something, and the 150 Max Children eventually fills up with incoming requests before the various timeouts kill them off. Except for when this issue happens, there are usually only 2-5 children needed to satisfy current traffic.

Server Specs

Server is an Amazon EC2 c4.Xlarge (7.3 Gb RAM, 4 vCPU ) running Ubuntu 14.04.

I also have New Relic installed and resources seem very good. Over half the memory is available, and CPU usage is usually well under 50% except for after PHP5-FPM hangs.

Various Config files as I currently have them configured

etc/apache2/apache2.conf

Mutex file:${APACHE_LOCK_DIR} default
PidFile ${APACHE_PID_FILE}

Timeout 40
KeepAlive On
MaxKeepAliveRequests 250
KeepAliveTimeout 6

User ${APACHE_RUN_USER}
Group ${APACHE_RUN_GROUP}

HostnameLookups Off
ErrorLog ${APACHE_LOG_DIR}/error.log
LogLevel error

IncludeOptional mods-enabled/*.load
IncludeOptional mods-enabled/*.conf

Include ports.conf

<Directory />
    Options +FollowSymLinks -SymLinksIfOwnerMatch
    AllowOverride None
    Require all denied
</Directory>


<Directory /usr/share>
    AllowOverride None
    Require all granted
</Directory>

<Directory /var/www/>
    Options +FollowSymLinks -SymLinksIfOwnerMatch
    AllowOverride All 
    Require all granted
</Directory>

AccessFileName .htaccess

<FilesMatch "^\.ht">
    Require all denied
</FilesMatch>

LogFormat "%{User-agent}i" agent

IncludeOptional conf-enabled/*.conf

IncludeOptional sites-enabled/*.conf
SSLProtocol ALL -SSLv2
SSLCipherSuite ECDHE-RSA-AES256-SHA384:AES256-SHA256:AES256-SHA256:RC4:HIGH:MEDIUM:+TLSv1:+TLSv1.1:+TLSv1.2:!MD5:!ADH:!aNULL:!eNULL:!NULL:!DH:!ADH:!EDH:!AESGCM

<IfModule fastcgi_module>

  Action fastcgi-php5-fpm /fastcgi.php5-fpm virtual
  Alias /fastcgi.php5-fpm /var/www/cgi-bin/fastcgi.php5-fpm  -socket /var/run/fastcgi/USERNAME.socket -appConnTimeout 10 -idle-timeout 250 -pass-header Authorization

  FastCgiExternalServer /var/www/cgi-bin/fastcgi.php5-fpm -socket /var/run/php5-fpm.sock -idle-timeout 15 -appConnTimeout 0 -pass-header Authorization -pass-header Range

  AddHandler fastcgi-php5-fpm php phar

   <Directory /var/www/cgi-bin>
      AllowOverride none
      Options FollowSymLinks
      <IfModule authz_core_module>
               Require env REDIRECT_STATUS
          Options +ExecCGI
      </IfModule>
  </Directory>
</IfModule>

<IfModule mpm_event_module>

  Timeout 300

  StartServers 3
  ThreadLimit 50
  ThreadsPerChild 50
  MaxConnectionsPerChild 10000
  MinSpareThreads 50
  MaxSpareThreads 250
  ServerLimit 600
  MaxRequestWorkers 600

  KeepAlive on
  MaxKeepAliveRequests 100
  KeepAliveTimeout 5
</IfModule>

etc/php5/fpm/php-fpm.conf

[global]
pid = /var/run/php5-fpm.pid

error_log = log/php5-fpm.log
syslog.facility = daemon
syslog.ident = php-fpm
log_level = debug
;log_level = notice


; If this number of child processes exit with SIGSEGV or SIGBUS within the time
; interval set by emergency_restart_interval then FPM will restart. A value
; of '0' means 'Off'.
; Default Value: 0
emergency_restart_threshold = 3

; Interval of time used by emergency_restart_interval to determine when 
; a graceful restart will be initiated.  This can be useful to work around
; accidental corruptions in an accelerator's shared memory.
; Available Units: s(econds), m(inutes), h(ours), or d(ays)
; Default Unit: seconds
; Default Value: 0
emergency_restart_interval = 2m

; Time limit for child processes to wait for a reaction on signals from master.
; Available units: s(econds), m(inutes), h(ours), or d(ays)
; Default Unit: seconds
; Default Value: 0
process_control_timeout = 5s

include=/etc/php5/fpm/pool.d/*.conf

etc/php5/fpm/pool.d/www.conf

[www]

user = www-data
group = www-data

listen = /var/run/php5-fpm.sock

; Set listen(2) backlog.
; Default Value: 65535 (-1 on FreeBSD and OpenBSD)
;listen.backlog = 65535

; Set permissions for unix socket, if one is used. In Linux, read/write
; permissions must be set in order to allow connections from a web server. Many
; BSD-derived systems allow connections regardless of permissions. 
; Default Values: user and group are set as the running user
;                 mode is set to 0660
listen.owner = www-data
listen.group = www-data
;listen.mode = 0660

pm = dynamic
pm.max_children = 150
pm.start_servers = 10
pm.min_spare_servers = 10
pm.max_spare_servers = 30

pm.process_idle_timeout = 10s
pm.max_requests = 40


pm.status_path = /public_html/bvt-fpmstatus.php

access.log = /var/log/$pool.access.log
access.format = %T %R \"%m %r%Q%q\" - %s %u  %f %{mili}d %{kilo}M %C%% %t"

; The log file for slow requests
; Default Value: not set
; Note: slowlog is mandatory if request_slowlog_timeout is set
;slowlog = /var/log/$pool.log.slow
;request_slowlog_timeout = 25s

; The timeout for serving a single request after which the worker process will
; be killed. This option should be used when the 'max_execution_time' ini option
; does not stop script execution for some reason. A value of '0' means 'off'.
; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
; Default Value: 0
request_terminate_timeout = 31s

; Chroot to this directory at the start. This value must be defined as an
; absolute path. When this value is not set, chroot is not used.
; Note: you can prefix with '$prefix' to chroot to the pool prefix or one
; of its subdirectories. If the pool prefix is not set, the global prefix
; will be used instead.
; Note: chrooting is a great security feature and should be used whenever 
;       possible. However, all PHP paths will be relative to the chroot
;       (error_log, sessions.save_path, ...).
; Default Value: not set
;chroot = 

; Chdir to this directory at the start.
; Note: relative path can be used.
; Default Value: current directory or / when chroot
chdir = /

; Redirect worker stdout and stderr into main error log. If not set, stdout and
; stderr will be redirected to /dev/null according to FastCGI specs.
; Note: on highloaded environement, this can cause some delay in the page
; process time (several ms).
; Default Value: no
php_admin_value[display_errors] = stderr
catch_workers_output = yes ; Add to Apache log.

; Limits the extensions of the main script FPM will allow to parse. This can
; prevent configuration mistakes on the web server side. You should only limit
; FPM to .php extensions to prevent malicious users to use other extensions to
; exectute php code.
; Note: set an empty value to allow all extensions.
; Default Value: .php
security.limit_extensions = .php .php3 .php4 .php5 .phar

; Pass environment variables like LD_LIBRARY_PATH. All $VARIABLEs are taken from
; the current environment.
; Default Value: clean env
;env[HOSTNAME] = $HOSTNAME
;env[PATH] = /usr/local/bin:/usr/bin:/bin
;env[TMP] = /tmp
;env[TMPDIR] = /tmp
;env[TEMP] = /tmp

php_admin_flag[log_errors] = on
;php_admin_flag[display_errors] = stderr            ; Send displayed errors to error stream instead of screen
;php_admin_value[error_reporting] = 2147483647      ; Log all (WARNING: very verbose)

;php_admin_value[memory_limit] = 32M
php_admin_value[upload_max_filesize] = 32M
php_admin_value[post_max_size] = 32M
php_admin_value[max_execution_time] = 30

Log Excerpts

/var/log/upstart/php5-fpm.log

After PHP_FPM restart @ 09:54 normal operation:

[11-Mar-2015 09:54:03] NOTICE: fpm is running, pid 17478
[11-Mar-2015 09:54:03] NOTICE: ready to handle connections
[11-Mar-2015 09:54:03] NOTICE: systemd monitor interval set to 10000ms
[11-Mar-2015 09:59:51] NOTICE: [pool www] child 17481 exited with code 0 after 348.302350 seconds from start
[11-Mar-2015 09:59:51] NOTICE: [pool www] child 17978 started
[11-Mar-2015 09:59:54] NOTICE: [pool www] child 17488 exited with code 0 after 350.794698 seconds from start
[11-Mar-2015 09:59:54] NOTICE: [pool www] child 17982 started
[11-Mar-2015 09:59:54] NOTICE: [pool www] child 17485 exited with code 0 after 351.236038 seconds from start
[11-Mar-2015 09:59:54] NOTICE: [pool www] child 17985 started
[11-Mar-2015 10:00:01] NOTICE: [pool www] child 17486 exited with code 0 after 357.770981 seconds from start
[11-Mar-2015 10:00:01] NOTICE: [pool www] child 17995 started
[11-Mar-2015 10:00:01] NOTICE: [pool www] child 17484 exited with code 0 after 358.020653 seconds from start
[11-Mar-2015 10:00:01] NOTICE: [pool www] child 17996 started
[11-Mar-2015 10:00:02] NOTICE: [pool www] child 17483 exited with code 0 after 358.627602 seconds from start
[11-Mar-2015 10:00:02] NOTICE: [pool www] child 18048 started
[11-Mar-2015 10:00:02] NOTICE: [pool www] child 17487 exited with code 0 after 358.753904 seconds from start
[11-Mar-2015 10:00:02] NOTICE: [pool www] child 18049 started
[11-Mar-2015 10:00:03] NOTICE: [pool www] child 17489 exited with code 0 after 360.321755 seconds from start
[11-Mar-2015 10:00:03] NOTICE: [pool www] child 18053 started
[11-Mar-2015 10:00:05] NOTICE: [pool www] child 17490 exited with code 0 after 361.648421 seconds from start
[11-Mar-2015 10:00:05] NOTICE: [pool www] child 18061 started
[11-Mar-2015 10:00:05] NOTICE: [pool www] child 17494 exited with code 0 after 359.677309 seconds from start
[11-Mar-2015 10:00:05] NOTICE: [pool www] child 18062 started
[11-Mar-2015 10:00:11] NOTICE: [pool www] child 17482 exited with code 0 after 368.373606 seconds from start
[11-Mar-2015 10:00:11] NOTICE: [pool www] child 18070 started

And then later we start to see the first sign of trouble (server cpu & memory are still in good shape at this point):

[11-Mar-2015 10:13:02] NOTICE: [pool www] child 19380 started
[11-Mar-2015 10:13:04] NOTICE: [pool www] child 18791 exited with code 0 after 380.518655 seconds from start
[11-Mar-2015 10:13:04] NOTICE: [pool www] child 19381 started
[11-Mar-2015 10:13:04] NOTICE: [pool www] child 18787 exited with code 0 after 382.270838 seconds from start
[11-Mar-2015 10:13:04] NOTICE: [pool www] child 19382 started
[11-Mar-2015 10:13:05] NOTICE: [pool www] child 18794 exited with code 0 after 378.655639 seconds from start
[11-Mar-2015 10:13:05] NOTICE: [pool www] child 19384 started
[11-Mar-2015 10:15:28] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 7 idle, and 20 total children
[11-Mar-2015 10:15:41] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 8 idle, and 32 total children
[11-Mar-2015 10:15:46] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 8 idle, and 38 total children
[11-Mar-2015 10:15:47] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 9 idle, and 40 total children
[11-Mar-2015 10:15:48] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 8 idle, and 41 total children
[11-Mar-2015 10:15:49] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 9 idle, and 43 total children
[11-Mar-2015 10:15:50] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 9 idle, and 44 total children
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19384, script '/var/www/html/index.php' (request: "GET /index.php") execution timed out (31.932041 sec), terminating
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19382, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (32.027672 sec), terminating
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19366, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (31.208359 sec), terminating
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19363, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (32.113767 sec), terminating
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19355, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (31.057660 sec), terminating
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19351, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (32.977536 sec), terminating
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19384 exited on signal 15 (SIGTERM) after 170.885585 seconds from start
[11-Mar-2015 10:15:55] NOTICE: [pool www] child 19760 started
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19366 exited on signal 15 (SIGTERM) after 184.386893 seconds from start
[11-Mar-2015 10:15:55] NOTICE: [pool www] child 19761 started
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19382 exited on signal 15 (SIGTERM) after 171.442721 seconds from start
[11-Mar-2015 10:15:55] NOTICE: [pool www] child 19762 started
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19355 exited on signal 15 (SIGTERM) after 193.595364 seconds from start
[11-Mar-2015 10:15:55] NOTICE: [pool www] child 19764 started
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19351 exited on signal 15 (SIGTERM) after 195.910381 seconds from start
[11-Mar-2015 10:15:55] NOTICE: [pool www] child 19765 started
[11-Mar-2015 10:15:55] WARNING: [pool www] child 19363 exited on signal 15 (SIGTERM) after 187.115830 seconds from start
[11-Mar-2015 10:15:55] NOTICE: [pool www] child 19766 started
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19657, script '/var/www/html/public_html/wp-admin/admin-ajax.php' (request: "POST /public_html/wp-admin/admin-ajax.php") execution timed out (31.374989 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19656, script '/var/www/html/public_html/wp-admin/admin-ajax.php' (request: "POST /public_html/wp-admin/admin-ajax.php") execution timed out (33.896470 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19652, script '/var/www/html/public_html/wp-admin/admin-ajax.php' (request: "POST /public_html/wp-admin/admin-ajax.php") execution timed out (35.011947 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19651, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (35.830626 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19381, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (38.100342 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19371, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (39.258347 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19365, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (39.064079 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19360, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (37.983709 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19356, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (40.136172 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19272, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (38.265691 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19229, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (34.137729 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19091, script '/var/www/html/public_html/wp-admin/admin-ajax.php' (request: "POST /public_html/wp-admin/admin-ajax.php") execution timed out (38.957131 sec), terminating
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19651 exited on signal 15 (SIGTERM) after 40.426680 seconds from start
[11-Mar-2015 10:16:06] NOTICE: [pool www] child 19840 started
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19091 exited on signal 15 (SIGTERM) after 384.915614 seconds from start
[11-Mar-2015 10:16:06] NOTICE: [pool www] child 19841 started
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19381 exited on signal 15 (SIGTERM) after 182.183820 seconds from start
[11-Mar-2015 10:16:06] NOTICE: [pool www] child 19842 started
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19652 exited on signal 15 (SIGTERM) after 39.431626 seconds from start
[11-Mar-2015 10:16:06] NOTICE: [pool www] child 19843 started
[11-Mar-2015 10:16:06] WARNING: [pool www] child 19656 exited on signal 15 (SIGTERM) after 38.430261 seconds from start
[11-Mar-2015 10:16:06] NOTICE: [pool www] child 19844 started

And this continues until I notice the rapid child escalation (and open MySQL connections). At which point I will restart the PHP_FPM service (@ 10:18:54). This time I caught it before it ran out of the 150 Max Children, but none of the PHP scripts were receiving 200 reply's anymore. Crash was inevitable.

[11-Mar-2015 10:18:51] WARNING: [pool www] child 20080, script '/var/www/html/index.php' (request: "GET /index.php") execution timed out (39.148257 sec), terminating
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20079, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (40.497106 sec), terminating
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20078, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (40.785344 sec), terminating
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20077, script '/var/www/html/public_html/index.php' (request: "GET /public_html/index.php") execution timed out (41.271773 sec), terminating
[11-Mar-2015 10:18:51] WARNING: [pool www] child 19969, script '/var/www/html/public_html/wp-content/plugins/w3-total-cache/pub/minify.php' (request: "GET /public_html/wp-content/plugins/w3-total-cache/pub/minify.php") execution timed out (33.608193 sec), terminating
[11-Mar-2015 10:18:51] WARNING: [pool www] child 19969 exited on signal 15 (SIGTERM) after 113.661735 seconds from start
[11-Mar-2015 10:18:51] NOTICE: [pool www] child 20279 started
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20081 exited on signal 15 (SIGTERM) after 62.003482 seconds from start
[11-Mar-2015 10:18:51] NOTICE: [pool www] child 20280 started
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20084 exited on signal 15 (SIGTERM) after 62.001519 seconds from start
[11-Mar-2015 10:18:51] NOTICE: [pool www] child 20283 started
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20077 exited on signal 15 (SIGTERM) after 62.009634 seconds from start
[11-Mar-2015 10:18:51] NOTICE: [pool www] child 20284 started
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20078 exited on signal 15 (SIGTERM) after 62.009429 seconds from start
[11-Mar-2015 10:18:51] NOTICE: [pool www] child 20285 started
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20079 exited on signal 15 (SIGTERM) after 62.009275 seconds from start
[11-Mar-2015 10:18:51] NOTICE: [pool www] child 20286 started
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20080 exited on signal 15 (SIGTERM) after 62.007759 seconds from start
[11-Mar-2015 10:18:51] NOTICE: [pool www] child 20287 started
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20097 exited on signal 15 (SIGTERM) after 51.677646 seconds from start
[11-Mar-2015 10:18:51] NOTICE: [pool www] child 20288 started
[11-Mar-2015 10:18:51] WARNING: [pool www] child 20100 exited on signal 15 (SIGTERM) after 51.675055 seconds from start
[11-Mar-2015 10:18:51] NOTICE: [pool www] child 20289 started
[11-Mar-2015 10:18:54] NOTICE: Terminating ...
[11-Mar-2015 10:18:54] NOTICE: exiting, bye-bye!

[Wed Mar 11 09:54:03.411897 2015] [fastcgi:error] [pid 13072:tid 140351645890304] [client 222.153.85.53:55464] FastCGI: incomplete headers (0 bytes) received from server "/var/www/cgi-bin/fastcgi.php5-fpm", referer: https://www.thewebsitedomain.com/public_html/some-post/ [Wed Mar 11 09:54:03.418052 2015] [fastcgi:error] [pid 13072:tid 140351486428928] (104)Connection reset by peer: [client 84.202.21.98:52139] FastCGI: comm with server "/var/www/cgi-bin/fastcgi.php5-fpm" aborted: read failed [Wed Mar 11 09:54:03.418078 2015] [fastcgi:error] [pid 13072:tid 140351486428928] [client 84.202.21.98:52139] FastCGI: incomplete headers (0 bytes) received from server "/var/www/cgi-bin/fastcgi.php5-fpm" [Wed Mar 11 10:15:19.031557 2015] [:error] [pid 18669:tid 140351536785152] [client 62.108.27.152:56596] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown [Wed Mar 11 10:15:55.905638 2015] [fastcgi:error] [pid 18728:tid 140351394109184] (104)Connection reset by peer: [client 211.28.60.59:53044] FastCGI: comm with server "/var/www/cgi-bin/fastcgi.php5-fpm" aborted: read failed [Wed Mar 11 10:15:55.905681 2015] [fastcgi:error] [pid 18728:tid 140351394109184] [client 211.28.60.59:53044] FastCGI: incomplete headers (0 bytes) received from server "/var/www/cgi-bin/fastcgi.php5-fpm" [Wed Mar 11 10:15:55.906363 2015] [fastcgi:error] [pid 18669:tid 140351494821632] (104)Connection reset by peer: [client 58.174.227.102:61362] FastCGI: comm with server "/var/www/cgi-bin/fastcgi.php5-fpm" aborted: read failed, referer: https://www.thewebsitedomain.com/public_html/some-post/page/3/

/var/log/apache2/error.log (domain-name and urls changed)

There are lots of entries in this, but they all fall into the two categories seen below.

The FastCGI: comm with server "/var/www/cgi-bin/fastcgi.php5-fpm" aborted: read failed ones are generated when I manually restart php-fpm, when it forcible closes all the connections that are still open.

The stderr: Primary script unknown ones are caused by bots trying random URL's and causing 404 errors.

[Wed Mar 11 12:28:53.135352 2015] [:error] [pid 31126:tid 139952113252096] [client 188.40.153.39:46874] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown
[Wed Mar 11 12:28:54.344039 2015] [:error] [pid 437:tid 139952197179136] [client 188.40.153.39:46875] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown
[Wed Mar 11 12:40:25.086767 2015] [:error] [pid 437:tid 139952338970368] [client 49.4.164.66:55601] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown
[Wed Mar 11 12:40:27.536523 2015] [:error] [pid 437:tid 139952079681280] [client 49.4.164.66:55847] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown
[Wed Mar 11 12:45:08.438864 2015] [:error] [pid 437:tid 139952347363072] [client 5.39.218.220:35839] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown
[Wed Mar 11 12:56:02.328823 2015] [core:error] [pid 437:tid 139952330577664] [client 41.13.228.41:26895] AH00126: Invalid URI in request POST ww.thewebsitedomain.com/public_html/wp-admin/admin-ajax.php HTTP/1.1
[Wed Mar 11 13:01:01.058193 2015] [:error] [pid 31126:tid 139952146822912] [client 69.12.90.191:58259] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown, referer: https://www.thewebsitedomain.com/
[Wed Mar 11 13:01:02.346250 2015] [:error] [pid 31126:tid 139952355755776] [client 69.12.90.191:58259] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown, referer: https://www.thewebsitedomain.com/wp-login.php?action=register
[Wed Mar 11 13:10:11.007667 2015] [:error] [pid 4074:tid 139952062895872] [client 23.94.222.15:53539] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown, referer: https://www.thewebsitedomain.com/
[Wed Mar 11 13:10:11.671945 2015] [:error] [pid 4074:tid 139952330577664] [client 23.94.222.15:53539] FastCGI: server "/var/www/cgi-bin/fastcgi.php5-fpm" stderr: Primary script unknown, referer: https://www.thewebsitedomain.com/wp-login.php?action=register

I have lots of other logs saved, along with detailed notes of my various attempted config changes. Wasn't sure what else may be relevant. Happy to post anything else that would help.

Thanks for any and all suggestions in finding a solution. Even if its just a temp workaround so I can leave the server unattended long enough to sleep and recharge.

Mark1270287
  • 111
  • 4
  • Hi I have this problem some times, my solution switch from UNIX socket to TPC PORT.. or may you need to increse net.core.somaxconn value – Skamasle Mar 11 '15 at 12:49
  • Thanks. I will check into those. Ideally I would prefer to stick with sockets if possible, as they have a network performance overhead advantage, and most of the config guides I have read have recommended moving away from the TPC Port method. I will check into net.core.somaxconn as well. – Mark1270287 Mar 11 '15 at 14:18
  • Used sudo sysctl -w net.core.somaxconn=1024 to set value up from 128 (default). Will see if that helps.... fingers crossed. – Mark1270287 Mar 11 '15 at 16:22
  • Crap, still hung up. This feels like "the button" in Lost, except at least they knew when they were going to need to push it. : ( – Mark1270287 Mar 11 '15 at 16:46

0 Answers0