0

I've a problem currently with my 48gb ram server: it ramdomly kills processes because of Out of Memory, even if it only uses 72% of it (I got this figure from my monitoring interface). (swap usage: 60%)

I have investigated a bit, but could not really find what is going wrong.

I'm currently trying to "read" the syslog to find if indeed my memory usage was 72%, but can't find the relevant info. Also, do you guys see anything that could help me in my investigation ?

Syslog: http://pastebin.com/4DczHYqF

Thanks

Thanks

Nisalon
  • 13
  • 3
  • What swappiness you got? 60% of SWAP is not good! did you try memcheck? Maybe its memory error. In syslog i see loads of apache processes, anything else going on there? Nginx is much less resource greedy btw ;) – 3h4x Aug 11 '14 at 20:18
  • Please include the contents of `/proc/meminfo` in your question. Information about memory zones from `/var/log/dmesg` would also be useful. – kasperd Aug 11 '14 at 20:19
  • I suggest you analyze the maximum RAM usage given your application's settings. For mysql you can use mysqltuner.pl, with default parameters it uses about 600MB. For Java app, see the -Xmx flag and for Apache take the (average) memory usage of one process (with top or ps for ex) and multiply it by MaxClients parameter. Also a monitoring tool that records (RRD type) historic usage data is a must. – LinuxDevOps Aug 11 '14 at 21:00
  • 1
    the swap was used 100%, this is from your log "Free swap = 0kB" – c4f4t0r Aug 11 '14 at 21:39
  • @c4f4t0r That's a good point. And if indeed swap usage was 100% and decreased to 60% after the OOM killer got activated, it means it managed to free resources, which is the intention. But in the log it looks like there is still free physical memory pages. How low will the kernel allow the free physical memory to go before it triggers the OOM killer? – kasperd Aug 12 '14 at 19:04

2 Answers2

1

The answer to your question is you are using more memory than 72%. You can calculate it via the OOM Report.

The bit that tells you what the state of the memory was is this bit:

Aug 11 04:20:14 myserver kernel: Node 0 DMA free:15884kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15968kB managed:15884kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Aug 11 04:20:14 myserver kernel: lowmem_reserve[]: 0 3539 48337 48337
Aug 11 04:20:14 myserver kernel: Node 0 DMA32 free:181176kB min:2060kB low:2572kB high:3088kB active_anon:1334312kB inactive_anon:445152kB active_file:221600kB inactive_file:1327176kB unevictable:0kB isolated(anon):0kB isolated(file):384kB present:3644928kB managed:3624548kB mlocked:0kB dirty:1328084kB writeback:0kB mapped:220kB shmem:7852kB slab_reclaimable:75340kB slab_unreclaimable:34920kB kernel_stack:48kB pagetables:4684kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2463022 all_unreclaimable? yes
Aug 11 04:20:14 myserver kernel: lowmem_reserve[]: 0 0 44797 44797
Aug 11 04:20:14 myserver kernel: Node 0 Normal free:25892kB min:26072kB low:32588kB high:39108kB active_anon:31943548kB inactive_anon:1879196kB active_file:5038872kB inactive_file:5769308kB unevictable:0kB isolated(anon):0kB isolated(file):4224kB present:46661632kB managed:45873100kB mlocked:0kB dirty:6184880kB writeback:324kB mapped:41268kB shmem:1577696kB slab_reclaimable:442212kB slab_unreclaimable:314476kB kernel_stack:3296kB pagetables:92756kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:17216562 all_unreclaimable? yes

In linux, memory is split into zones which are regions of physical RAM allotted (usually by address range) to serve a particular purpose.

DMA is for very old hardware which in could only address a small region in DMA. This zone does not accumulate for much memory and is rarely -if ever- used. DMA32 is a zone reserved for hardware which can only address 32 bits of memory. This is used in 64bit systems for that particular class of hardware. Normally cover about 4G of memory (can be less though).

The vast majority of memory however gets allocated to the 'Normal' zone. Nearly all memory goes in and out of here. Its used when there is not special purpose markers against the memory being allocated. When memory in this zone gets tough to find, the kernel typically starts using memory in other zones to find it (although I believe DMA never gets touched).

Based off of your logs, the following calculation can be made.

 DMA Present + DMA32 Present + Normal Present = Total available memory
 15968kB     + 3644928kB     + 46661632kB     = 50322528kB
-
 DMA Free    + DMA32 Free    + Normal Free    = Total free memory
 15884kB     + 181176kB      + 25892kB        = 222952kB
=
 Total available memory - Total free memory = Total Used Memory
 50322528kB             - 222952kB          = 50099576kB
=
 (Total used memory / Total available memory) * 100 = Total used percent
 50099576kB         / 50322528kB              * 100 = 99.55%

Other zones can in effect be free of memory but still you get an OOM because the zone you want is not free.

OOM killer is a possible means to free memory when the min value is greater than the free value. If you check your Normal zone you can see that this is the case here.

Your biggest memory consumers are mysql and java. Either retune them to alter their memory usage, or get more memory for them.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
  • Thanks for this really helpful answer. One last thing. How to determine the memory used by mysql ? – Nisalon Aug 12 '14 at 07:33
  • memory used my mysql: `wget https://raw.githubusercontent.com/major/MySQLTuner-perl/master/mysqltuner.pl && perl mysqltuner.pl` or see calculator at http://www.mysqlcalculator.com/ – LinuxDevOps Aug 12 '14 at 15:23
-1

It is maybe not the overall memory but the memory allowed to one user which is defined by ulimit command.

Dom
  • 6,628
  • 1
  • 19
  • 24
  • 2
    This returns `ENOMEM` and does not invoke the `OOM` killer. Exceeding memory in a cgroup does, but you would able able to tell if that was the case by the kernel stack trace. – Matthew Ife Aug 11 '14 at 21:21