1

My application randomly returned OOM errors trying to allocate 16M chunks, while Linux had plenty memory used by disk cache (20G).

Swapping disabled. All OS limits seem fine.

After clearing Linux cache with drop_caches error disappeared.

Any idea what to check or is it somehow expected behavior?

nonobe
  • 11
  • 1
  • 2
  • This isn't really a Server Fault question - if you'd like I can move it to Stack Overflow for you. If I had to weigh in though I would do do by slamming a copy of POSIX on the desk and shouting "MALLOC() CAN FAIL. EXPECT IT. DEAL WITH IT IN YOUR CODE!" (It doesn't matter WHY - code that calls malloc() should expect failures & cope with them :) You also shouldn't disable swap on unix systems as a general rule... – voretaq7 Sep 14 '12 at 13:23
  • 1. Stackoverflow will say that it is Linux core question and will move to serverfault? [br] 2. Question is why malloc fails, while Linux has a lot of memory? [br] 3. Swapping makes more problems than "Out of memory message" sometimes. – nonobe Sep 14 '12 at 13:28
  • 1
    Stack Overflow would be wrong. It would not be the first time the were wrong about migrating things here just because the word "unix" or "server" was present. This is a question about a POSIX call's behavior - that's not system administration, it's C programming. it's tangentially on topic here at best (not bad enough for me to mod-close it, but probably not going to get good answers...) – voretaq7 Sep 14 '12 at 13:37
  • No, you are wrong. POSIX call returns NULL if OS has not enough memory - but question is why OS thinks it has not enough memory which is hardly related to programming. It is sad if you don't understand it. – nonobe Sep 18 '12 at 06:26
  • found identical problem, still not answered: http://serverfault.com/questions/288319/linux-not-freeing-large-disk-cache-when-memory-demand-goes-up Identical because Linux calls OOM Killer while a lot of cache is available. In my case Killer is disabled, so just NULL is returned. – nonobe Sep 18 '12 at 07:22
  • `malloc()` can fail any time the OS can't find a way to give you the memory you asked for (this is ***NOT*** the same as "not enough memory to satisfy the request" - the two are just *usually* (99%+) coincident) If you want to know why the OS can't satisfy your memory request you'll have to jump into a kernel debugger - we can't tell you why without access to the specific system in question. Not being able to swap increases the likelihood of failures because the system can't free RAM by swapping, but if "No Swap" is a requirement for you you're pretty much back to a debugger... – voretaq7 Sep 18 '12 at 16:20
  • Thank you for good answer. It is strange that Linux has no user-friendly diagnostics for such important feature. Using debugger is problematic because problem randomly happens on high workload and it is difficult to repeat it. Just it seems strange that Linux decides to refuse malloc() request instead of dropping some amount of huge Disk cache. I believe answer is known, just difficult to find where it is. – nonobe Sep 24 '12 at 13:00

2 Answers2

2

There was plenty of memory, but probably fragmented so you couldn't get a 16mb continuous chunk. Drop_caches will have triggered a defrag of memory so afterwards there is sufficient continuous memory available to honor your malloc request.

(This question is probably more suited to one of the programming forums.)

Tonny
  • 6,252
  • 1
  • 17
  • 31
  • Application requested 16Mb and 20Gb were technically available. So it doesnt look like fragmentation problem. Also it involves Linux memory management, not sure if that is programming question – nonobe Sep 14 '12 at 11:15
2

Malloc() does not allocate physical memory, it allocates virtual memory. Malloc() can fail due to lack of (continuous free chunk of) virtual memory or exceeded commit limit.

  1. Check the virtual memory usage of the process using ps, top or pmamp commands. 64bit architectures (amd64) have extremely large virtual memory and is basically impossible to exhaust that, but 32bit process would be limited to at most 4GB of virtual memory.
  2. Check /proc/sys/vm/overcommit_memory and Committed_AS and CommitLimit rows in /proc/meminfo. If overcommit_memory is 1, exceeding CommitLimit will cause malloc() to fail.
x22
  • 181
  • 1
  • Committed_AS was ~10G lower than CommitLimit. I certainly sure it wasn't any limit because after drop_caches all memory allocations were successful. – nonobe Sep 14 '12 at 13:10
  • Am I correct that when swapping is disabled, then Virtual memory is almost the same as Physical memory? – nonobe Sep 14 '12 at 13:26