Why does my server running nginx/php-fpm keep losing session capability without generating any errors?

Question

I am managing a server that has a couple dozen websites on it and they have all been working fine until last week when it was noticed that one site had seemingly lost the ability to maintain session data. Then another. (I am guessing it is affecting all sites on this server but just has not been reported yet.) I changed absolutely nothing in either site's configs recently. I have added no software to the server recently. I have not changed the general nginx or php-fpm configs. There are no errors in the nginx or php-fpm error logs that correspond to this failure. Restarting php-fpm appears to clear up the problem at least temporarily. Inevitably, the problem recurs. How is it possible that php-fpm can fail like this without producing an error message somewhere? I have been googling extensively and have not found anyone else with this problem.

The server is running RHEL 6 with nginx and php-fpm (remi repo). I can't remember if this server is running APC but I don't think it is. All patches are up to date.

I am guessing I just have hit some sort of threshold where the current php-fpm configs are insufficient, though I don't understand why I am getting no errors when that limit is reached. Here are what I suspect are the relevant php-fpm settings...

pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
php_admin_value[error_log] = /var/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on

Is there an error log somewhere I' missing where this would be reported? As I mentioned, there is nothing in /var/log/php-fpm/www-error.log, or in the general nginx error log or in the site-specific nginx error logs.

P.S. : I do get other kinds of error messages in all of the logs I mentioned so the lack of error messages is not a permission issue.

Here are df outputs (edited to remove identifying physical paths)...

# df -h
Filesystem            Size  Used Avail Use% Mounted on
xxx
                      8.4G  3.8G  4.2G  48% /
xxx                   7.8G     0  7.8G   0% /dev/shm
xxx                   477M   79M  373M  18% /boot
xxx
                      976M  713M  213M  78% /home
xxx
                      976M   30M  896M   4% /tmp
xxx
                      9.8G  4.6G  4.7G  50% /var


# df -i
Filesystem            Inodes IUsed   IFree IUse% Mounted on
xxx
                      547584 87083   460501   16% /
xxx                   2041821    1  2041820   1%  /dev/shm
xxx                   128016    50  127966    1%  /boot
xxx
                      65536   19285 46251     30% /home
xxx
                      65536   173   65363     1%  /tmp
xxx
                      655360 19441  635919    3%  /var

And here is the php-fpm status page while the site is not allowing sessions to be saved...

pool:                 www
process manager:      dynamic
start time:           06/Aug/2015:10:53:06 -0400
start since:          332263
accepted conn:        2899
listen queue:         0
max listen queue:     0
listen queue len:     128
idle processes:       9
active processes:     1
total processes:      10
max active processes: 9
max children reached: 0
slow requests:        0

Ah, there's your problem. Run `ls -ld /var/lib/php/session` and look closely at the output. — Michael Hampton, Aug 05 '15 at 21:39
Output is drwxrwx--- 3 nginx nginx 180224 Aug 5 17:26 /var/lib/php/session...what am I missing? — Dave W., Aug 05 '15 at 21:42
That's a monster sized directory. When is the last time old sessions were cleaned out? — Michael Hampton, Aug 05 '15 at 21:42
Looks like the oldest session is two days old. There does not appear to be more sessions than usual. The largest appears to be less than 1KB and there are only a few dozen sessions in there so I'm not sure how it's adding up to almost 200KB. — Dave W., Aug 05 '15 at 21:54
If it were my site I'd blow away the whole directory and recreate it. (This will log everyone out and empty carts, assuming anyone is actually logged in...) You might also think about using some other session store, if you usually have such a large number of sessions... — Michael Hampton, Aug 05 '15 at 22:03
There are 236 sessions there and maybe a dozen more in two different session save locations (done so session length could be different from the default). That does not seem like a lot of sessions. — Dave W., Aug 05 '15 at 22:07
For the _directory_ to have hit 180K in size, you would have had to have somewhere around a million sessions in there at some point. — Michael Hampton, Aug 05 '15 at 22:10

score 0 · Answer 1 · answered Aug 05 '15 at 23:39

How is it possible that php-fpm can fail like this without producing an error message somewhere?

Because whoever wrote the failing code didn't check the failure and cause the program to write an error message. Programs aren't magic; they're written by humans who don't always anticipate every possible problem.

My intuition is that you've hit a disk storage limit somewhere; disk space, inodes, whatever. The solution is to either run something like tmpreaper over your session store regularly to keep the number of old sessions to a minimum, or else switch to using another (auto-expiring) session store like memcached.

I have added df -h and df -i outputs. I failed to mention previously that during the course of troubleshooting I set the first failing site's session directory to a new location, so it was not even sharing session space with other sites and it was not a directory with a bunch of old sessions. — Dave W., Aug 06 '15 at 12:20

Why does my server running nginx/php-fpm keep losing session capability without generating any errors?

1 Answers1