0

While running a website hosted into two app servers via load balancer, suddenly one app server stopped working and got hang. From the other app server access log found 499 status and also load average was high. About 20 minutes later, it started giving 200 status. And then, when the other app server completely rebooted, it also started working fine.

I didn't understand why suddenly this happened. From the error log, I found the below issue:

2019/11/03 12:43:19 [error] 26445#0: *30538354 FastCGI sent in stderr: "PHP message: PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 47264368 bytes) in /.........../sites/all/modules/contrib/memcache/dmemcache.inc on line 64" while reading response header from upstream, client: ............, server: .........., request: "................", upstream: "fastcgi://unix:/var/run/php-fpm/php-fpm.sock:", host: "...........", referrer: "..........."

Now, what I need to do to fix the issue, so that it will never occur in future ?

Marco
  • 1,679
  • 3
  • 17
  • 31

2 Answers2

0

You need to increase the memory limit for each PHP process in your php.ini file. It appears to be set at 256MB right now. Be sure to restart your web server after making the change.

memory_limit = 512M
Bert
  • 2,733
  • 11
  • 12
0

More broadly, you're failing to allocate memory inside the memcache module. That suggests that you have a single very large cached object (~47 MB, per the error message) that you're trying to load. In a server running under a 256 MB memory limit, spending almost 20% of it on a single object is not going to work out well.

In Drupal, this manifests in a few forms: you might need to "get over the hump" and that 47 MB is an intermediate object, in which case you'll see simple page loads succeed, then any that depend on that 47 MB object fail until one of them succeeds, then all loads succeed. Or that object might be agglomerative, in which case you'll see requests start off fine and then start failing later in the day. Or the object could be specific to particular parts of the site, or even a particular localization. It's really hard to know, and my point is that the symptoms don't always align and may even seem non-deterministic.

To debug, you could start by either querying memcache directly to see what's in the cache of approximately that size, or by turning on detailed logging in the memcache module so it will tell you what it was attempting to GET when it failed. See the "Debug Logging" section of this link for details of how to do the latter, in D7 at least. You could also probably infer some additional context from the full stack trace, if you have it.

Ultimately, as Bert alludes to here, debugging this might be well beyond the scope of what you'd like to do. In that case, increasing the memory_limit will certainly make the problem go away for now. While this seems like an easy way out, keep in mind that whether or not the issue will recur will still remain an unanswerable question until you do the debugging steps outlined above.

BMDan
  • 7,129
  • 2
  • 22
  • 34