1

Let me start with picture:

Memory usage

This is memory usage of our backup Tomcat server. It is just hanging there, processing simple healt check request every couple fo seonds and waiting for main server's crash to take the load. And it still has this growing memory usage. The main server has the same growing memory. And sooner or later, Nagios starts spamming SMSes and emails about memory and swap use.

Both servers are running CentOS 7, kernel 3.10, Java 1.7 and Tomcat 7.

Even when I stop the Tomcat server using systemctl stop tomcat, the memory still stays used.

Only way I found to free the memory is sync && echo 3 > /proc/sys/vm/drop_caches. So, the workaround is to put this in cronjob. But I'd like to found a proper solution.

I found this thread about similar problem, and it mentions setting MALLOC_ARENA_MAX to 4 (and some other threads advice to just 1) and I also found a thread saying it should work with MALLOC_CHECK_ environment variable. But it does not. That's what you can see at the right part of the chart.

If I look ad Used memory, it stays around 600 MB and Used non-heap memory is at 70 MB.

Do you have any idea what might be causing this and how to fix this? And I repeat, the memory is not freed after Tomcat is stopped, so I don't believe it's leak in our app.

# free -m
             total       used       free     shared    buffers     cached
Mem:         64268       4960      59307         64          0        135
-/+ buffers/cache:       4824      59443
Swap:         2047          0       2047

# ps -eo rss | awk '{sum += $1}; END {print sum/1024/1024}'
2.54199

Update from this morning, 9 hours after cronjob freed the memory:

# free -h
             total       used       free     shared    buffers     cached
Mem:           62G        13G        48G        80M         0B        77M
-/+ buffers/cache:        13G        48G
Swap:         2,0G         4K       2,0G

# ps -eo vsize | awk '{sum += $1}; END {print sum/1024/1024}'
25.8389

# ps -eo rss | awk '{sum += $1}; END {print sum/1024/1024}'
1.24232

# ps -eo pmem,rss,vsize,args | grep tomcat
 1.7  1158608 22684408 java -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat/temp -Djava.util.logging.config.file=/usr/share/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start

This is getting really weird. There is running keepalived which keeps sending HTTP request to health check function of our app via curl every few seconds. If I stop the keepalived, the memory stops growing. If I send the same curl request in infinite loop from bash on the same machine, the memory starts growing again. Even if I send the request to URL returning 404. If I start the same loop on different machine (so not from localhost), the memory is ok.

# slabtop -o
 Active / Total Objects (% used)    : 244359165 / 244369016 (100,0%)
 Active / Total Slabs (% used)      : 5810996 / 5810996 (100,0%)
 Active / Total Caches (% used)     : 70 / 99 (70,7%)
 Active / Total Size (% used)       : 45770306,72K / 45772288,52K (100,0%)
 Minimum / Average / Maximum Object : 0,01K / 0,19K / 8,00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
243660018 243660018  11%    0,19K 5801429       42  46411432K dentry                 
143872 141868  98%    0,03K   1124      128      4496K kmalloc-32             
118150 118150 100%    0,02K    695      170      2780K fsnotify_event_holder  
 87040  87040 100%    0,01K    170      512       680K kmalloc-8              
 80448  79173  98%    0,06K   1257       64      5028K kmalloc-64             
 56832  56832 100%    0,02K    222      256       888K kmalloc-16             
 31926  31926 100%    0,08K    626       51      2504K selinux_inode_security 
 31140  31140 100%    0,11K    865       36      3460K sysfs_dir_cache        
 15795  14253  90%    0,10K    405       39      1620K buffer_head            
 15008  14878  99%    1,00K    469       32     15008K xfs_inode              
 14616  13365  91%    0,19K    348       42      2784K kmalloc-192            
 11961  11714  97%    0,58K    443       27      7088K inode_cache            
 10048   9108  90%    0,06K    157       64       628K anon_vma               
  9664   9480  98%    0,12K    302       32      1208K kmalloc-128            
  9287   7954  85%    0,21K    251       37      2008K vm_area_struct         
  8624   8624 100%    0,07K    154       56       616K Acpi-ParseExt          
  7264   7063  97%    0,25K    227       32      1816K kmalloc-256            
  5908   5908 100%    0,57K    211       28      3376K radix_tree_node        
  5304   5304 100%    0,04K     52      102       208K Acpi-Namespace         
  4620   4620 100%    0,09K    110       42       440K kmalloc-96             
  3744   3586  95%    1,00K    117       32      3744K kmalloc-1024           
  3458   3458 100%    0,30K    133       26      1064K nf_conntrack_ffffffff819a29c0
  3360   3067  91%    0,50K    105       32      1680K kmalloc-512            
  3108   3108 100%    0,38K     74       42      1184K blkdev_requests        
  2975   2975 100%    0,05K     35       85       140K shared_policy_node     
  2520   2368  93%    0,64K    105       24      1680K proc_inode_cache       
  1560   1560 100%    0,81K     40       39      1280K task_xstate            
  1300   1300 100%    0,15K     50       26       200K xfs_ili                
  1272   1272 100%    0,66K     53       24       848K shmem_inode_cache      
  1176   1176 100%    1,12K     42       28      1344K signal_cache           
  1024   1024 100%    2,00K     64       16      2048K kmalloc-2048           
   975    975 100%    0,62K     39       25       624K sock_inode_cache       
   900    900 100%    0,44K     25       36       400K scsi_cmd_cache         
   864    864 100%    0,25K     27       32       216K tw_sock_TCPv6          
   737    644  87%    2,84K     67       11      2144K task_struct            
   720    672  93%    2,00K     45       16      1440K TCPv6                  
   704    704 100%    0,18K     16       44       128K xfs_log_ticket         
   665    665 100%    0,23K     19       35       152K cfq_queue              
   646    646 100%    0,94K     19       34       608K RAW                    
   640    640 100%    0,39K     16       40       256K xfs_efd_item           
   624    624 100%    0,10K     16       39        64K blkdev_ioc             
   624    624 100%    0,20K     16       39       128K xfs_btree_cur          
   578    578 100%    0,12K     17       34        68K fsnotify_event         
   555    555 100%    2,06K     37       15      1184K sighand_cache          
   528    528 100%    0,48K     16       33       256K xfs_da_state           
   512    512 100%    0,06K      8       64        32K kmem_cache_node        
   512    512 100%    1,00K     16       32       512K UDP                    
   465    465 100%    2,06K     31       15       992K idr_layer_cache        
   450    450 100%    0,62K     18       25       288K files_cache            
Pitel
  • 189
  • 8
  • 1
    Check your Tomcat memory reports - is it heap growing? Chances are, it's is very simply a memory leak created by the developer. Get a few memory dump at regular intervals, give it to the dev guys and tell them to fix it. – ETL Mar 02 '15 at 14:56
  • Well, I am the dev :( But if it is a memory leak, how is it possible the memory is not freed after tomcat restart? – Pitel Mar 02 '15 at 14:58
  • Sorry, missed that. Are you sure it's Tomcat using the memory? – ETL Mar 02 '15 at 15:41
  • Yes. According to `top`, `java` is using most of the memory (which is fine, it's the main app), but just not that much. Even when the swap stated being used, the `java` process was using just 10 % MEM. – Pitel Mar 02 '15 at 15:54
  • 1
    Are you having problems with responsiveness of either server or is it just that your monitoring tools are alerting on what they think is an undesirable situation? If the latter then this looks like a duplicate of http://serverfault.com/questions/449296/why-is-linux-reporting-free-memory-strangely - the memory is being used for OS cache as it should be. That's backed up by the fact that the apparent usage only drops when you clear the cache. – Paul Haldane Mar 02 '15 at 23:59
  • I don't think that's the case either, because as you can see in `free` output, `Mem` and `-/+ buffers/cache` shows similar numbers. And also, when all memory is used, it starts using swap. – Pitel Mar 03 '15 at 07:48
  • 1
    Please add the output of `slabtop` command. – Janne Pikkarainen Mar 04 '15 at 08:43

1 Answers1

2

Based on the slabtop output I would say that something is adding lots of temporary files somewhere and then deleting them. An additional twist is that some process is still holding the file handles to deleted files, so the files are not getting freed from the dentry. Since the files are marked as deleted, you cannot find them with the regular ls and similar commands.

Might be curl or the script you are calling it from. See with lsof -n | grep deleted if there are lots of deleted entries and track down the culprit from there.

Janne Pikkarainen
  • 31,454
  • 4
  • 56
  • 78