We're getting page allocation failures on a Linux server (3.2 kernel). We've been told the problem is directly related to the high use of the memory for cache; this would lead to external fragmentation, and ultimately to page allocation errors. Here's an excerpt of top's output:
top - 10:45:09 up 3 days, 17:10, 0 users, load average: 1.00, 0.97, 1.08
Tasks: 313 total, 3 running, 310 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.9%us, 1.2%sy, 0.0%ni, 89.8%id, 0.2%wa, 0.0%hi, 0.9%si, 0.0%st
Mem: 8174056k total, 7948312k used, 225744k free, 278412k buffers
Swap: 2072348k total, 180k used, 2072168k free, 4676676k cached
We've been told that the problem is being mitigated by freeing the caches periodically:
echo 3 > /proc/sys/vm/drop_caches
I've been told to reduce the cache usage in order to solve the problem definitely. But I'm reluctant to believe that caching would lead to page allocation errors. I understand that memory allocated for disk caching is basically free memory (according to http://www.linuxatemyram.com) and high memory usage for caching is a good sign, actually. It would lead to external fragmentation, sure, but would the kernel fail to reclaim this space in order to satisfy an order-4 allocation request, for instance? The sample programs available at http://www.linuxatemyram.com show that an application have no problem to allocate memory in such a scenario, but would it be any different if the kernel needed to allocate the same amount of memory?