0

We're getting page allocation failures on a Linux server (3.2 kernel). We've been told the problem is directly related to the high use of the memory for cache; this would lead to external fragmentation, and ultimately to page allocation errors. Here's an excerpt of top's output:

top - 10:45:09 up 3 days, 17:10,  0 users,  load average: 1.00, 0.97, 1.08
Tasks: 313 total,   3 running, 310 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.9%us,  1.2%sy,  0.0%ni, 89.8%id,  0.2%wa,  0.0%hi,  0.9%si,  0.0%st
Mem:   8174056k total,  7948312k used,   225744k free,   278412k buffers
Swap:  2072348k total,      180k used,  2072168k free,  4676676k cached

We've been told that the problem is being mitigated by freeing the caches periodically:

echo 3 > /proc/sys/vm/drop_caches

I've been told to reduce the cache usage in order to solve the problem definitely. But I'm reluctant to believe that caching would lead to page allocation errors. I understand that memory allocated for disk caching is basically free memory (according to http://www.linuxatemyram.com) and high memory usage for caching is a good sign, actually. It would lead to external fragmentation, sure, but would the kernel fail to reclaim this space in order to satisfy an order-4 allocation request, for instance? The sample programs available at http://www.linuxatemyram.com show that an application have no problem to allocate memory in such a scenario, but would it be any different if the kernel needed to allocate the same amount of memory?

  • 1
    I'm not convinced by whoever sent you this assessment. Can you provide the complete output from the page allocation failures (including some lines above and below) and also provide the output of `cat /proc/buddyinfo` – Matthew Ife Jul 18 '14 at 20:18
  • Also can you please add the output of cat /proc/meminfo – Prashant Lakhera Aug 26 '14 at 08:30

1 Answers1

0

Linux will use unused RAM for disk caching and that's well known as you said. However, if you have an application that uses a lot of RAM all at once for short periods, there may not be enough available memory and the system can't dump the disk cache fast enough, resulting in the failure that you describe.

The key to this is to resolve the page allocation failures to begin with. Adding more RAM will certainly help if you can, as will adding swap (typically I see swap=RAM on most applications) to handle spikes. This question also details some kernel tuning you can do.

Nathan C
  • 14,901
  • 4
  • 42
  • 62