0

I have a bunch of servers. It's a mix of baremetal, VMs. All running Ubuntu Bionic. This issue has been observed on both types.

FYI, I have read similar questions and I've read the answers. I believe this is a unique problem.

I am the Infra guy here, and I have ran out of ideas. I am unable to find what's using the RAM. And OOM gets invoked once it gets critical. The servers do not use Swap.

# free -h
             total        used        free      shared  buff/cache   available
Mem:           7.7G        7.2G        205M        9.9M        369M        499M
Swap:            0B          0B          0B

# cat /proc/meminfo
MemTotal:        8126308 kB
MemFree:          183608 kB
MemAvailable:     497976 kB
Buffers:           30536 kB
Cached:           263980 kB
SwapCached:            0 kB
Active:          1470404 kB
Inactive:         111476 kB
Active(anon):    1285424 kB
Inactive(anon):     8952 kB
Active(file):     184980 kB
Inactive(file):   102524 kB
Unevictable:           4 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               160 kB
Writeback:             0 kB
AnonPages:       1287448 kB
Mapped:           142868 kB
Shmem:             10168 kB
Slab:             279184 kB
SReclaimable:      95864 kB
SUnreclaim:       183320 kB
KernelStack:       11648 kB
PageTables:        18184 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4063152 kB
Committed_AS:    4175792 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     7810548 kB
DirectMap2M:      536576 kB
DirectMap1G:           0 kB

# top
top - 16:23:00 up 185 days, 20:54,  1 user,  load average: 0.26, 0.36, 0.43
Tasks: 239 total,   1 running, 144 sleeping,   0 stopped,   1 zombie
%Cpu(s):  1.6 us,  4.0 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0 st
KiB Mem :  8126308 total,   168140 free,  7534988 used,   423180 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   516260 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                           
16846 freerad   20   0  903852 417960   5392 S   0.0  5.1  14:21.93 freeradius                                                                                        
16864 bind      20   0  927648 242732   2512 S   0.7  3.0  42:38.28 named                                                                                             
29219 tunnelmon  20   0 3869952 107172  23700 S   0.0  1.3  25:58.11 tunnelmonitor.py                                                                                     
16859 scrambl+  20   0  110416  58644   2404 S   0.0  0.7   0:02.08 obfsproxy                                                                                         
26697 root      20   0 3933520  43580   2548 S   0.0  0.5  82:43.79 dockerd                                                                                           
19858 timesca+  20   0  305048  33032   8312 S   0.0  0.4   0:58.58 node                                                                                              
16865 stunnel   20   0 4209416  30900   3092 S   1.7  0.4  26:42.16 stunnel4                                                                                          
11899 diamond   20   0   86376  26956   3216 S   0.0  0.3   0:00.00 diamond                                                                                           
16842 diamond   20   0   86376  26828   3088 S   0.0  0.3   2:03.70 diamond                                                                                           
 3244 root      20   0 1865764  25932    572 S   0.0  0.3 581:23.11 containerd                                                                                        
16144 browser+  20   0  795684  25316   2128 S   2.7  0.3  11:23.89 httpproxy                                                                                         
17278 fpsync    20   0   94484  24516   2324 S   0.0  0.3   0:03.22 gunicorn                                                                                          
18243 diamond   20   0   83908  23584   2412 S   0.0  0.3   0:12.34 diamond                                                                                           
18369 diamond   20   0   83908  23484   2384 S   0.0  0.3   0:35.92 diamond                                                                                           
18280 diamond   20   0   83908  23392   2388 S   0.0  0.3   0:25.64 diamond                                                                                           
17558 diamond   20   0   83776  23176   2344 S   0.0  0.3   0:12.33 diamond                                                                                           
18328 diamond   20   0   83908  23148   2140 S   0.0  0.3   0:07.82 diamond                                                                                           
18150 diamond   20   0   83520  23032   2224 S   0.0  0.3   0:14.70 diamond                                                                                           
17523 diamond   20   0   83388  22668   2160 S   0.0  0.3   0:08.29 diamond                                                                                           
18117 diamond   20   0   83520  22620   1896 S   0.0  0.3   0:13.21 diamond                                                                                           
18208 diamond   20   0   83776  22612   1764 S   0.0  0.3   0:01.44 diamond                                                                                           
17405 diamond   20   0   83132  22032   1624 S   0.0  0.3   1:12.88 diamond                                                                                           
17541 diamond   20   0   83528  21604   1000 S   0.0  0.3   0:41.34 diamond                                                                                           
17576 diamond   20   0   83520  21556    900 S   0.0  0.3   0:30.24 diamond                                                                                           
23418 strongs+  20   0   68596  21324   8268 S   3.3  0.3   0:00.10 python3                                                                                           
16638 root      20   0   69176  21112   2404 S   0.0  0.3  11:32.82 supervisord                                                                                       
22541 root      19  -1  102988  20940  12004 S   0.3  0.3   8:22.35 systemd-journal                                                                                   
17484 strongs+  20   0 1632772  17592   4712 S   0.3  0.2  48:52.86 charon                                                                                            
19628 telegraf  20   0 5037948  16768      4 S   0.0  0.2   3:25.14 telegraf                                                                                          
16843 fpsync    20   0   59092  15752   1676 S   0.0  0.2   0:30.13 gunicorn                                                                                          
19977 obfs4pr+  20   0   66352  14932   5496 S   0.0  0.2   0:00.11 python3                                                                                           
16851 openvpn   20   0   46312  14352   3264 S  24.6  0.2 166:22.47 openvpn                                                                                           
16848 openvpn   20   0   44864  13036   3300 S   1.3  0.2 175:15.40 openvpn                                                                                           
17028 diamond   20   0 1090552  12604      0 S   0.0  0.2   1:56.94 diamond                                                                                           
16850 openvpn   20   0   43948  12160   3348 S   5.0  0.1 167:30.49 openvpn                                                                                           
 6754 root      20   0  169264  10304   8764 S   0.0  0.1   0:00.02 sshd                                                                                              
21308 root      20   0  139148   9616    228 S   0.0  0.1   0:13.55 fail2ban-server                                                                                   
16840 openvpn   20   0  180084   9304   3380 S   0.0  0.1   0:07.93 openvpn-api-gra                                                                                   
16847 openvpn   20   0   40764   9208   3572 S   1.0  0.1  58:54.04 openvpn                                                                                           
 7008 root      20   0  161708   8792   7360 S   0.0  0.1   0:00.68 sudo                                                                                              
30989 syslog    20   0  299448   6288   3152 S   0.3  0.1   2:42.85 rsyslogd                                                                                          
19979 obfs4pr+  20   0 1497424   5712      0 S   1.0  0.1  10:05.53 obfs4proxy                                                                                        
    1 root      20   0  226484   5640   2024 S   0.0  0.1 396:39.45 systemd                                                                                           
 6757 tbear     20   0   76756   5556   4360 S   0.0  0.1   0:00.15 systemd                                                                                           
19816 root      20   0 1670468   4244   1548 S   0.0  0.1   0:19.03 docker-proxy                                                                                      
 2662 message+  20   0   53680   4180      0 S   0.0  0.1 114:04.87 dbus-daemon                                                                                       
 7015 root      20   0   21612   4028   3452 S   0.0  0.0   0:00.16 bash                                                                                              
22288 root      20   0   39828   3876   3096 R   0.3  0.0   0:00.22 top      

# uname -r 
4.15.0-66-generic

I also did check slabtop and not much in use. Also, as this can be seen from /proc/meminfo. I have also tried different things such as using ps_mem to track memory usage. I have ran out of ideas. I'd appreciate if anyone has any ideas. Thanks Note: A reboot fixes issue temporarily but that's not a practical solution.

1 Answers1

0

Is your system feeling better if you run the following command:

#memory before
free -h

#clear pagecache, dentries, inode
sync
echo 3 > /proc/sys/vm/drop_caches

#memory after
free -h

If is it going better and you have more free memory, you may consider adding this command to a daily CRON JOB: An CRON example of this command on a daily basis is given here

  • No. It didn't make anything better. At this point I'm almost certain this is some kernel memory bug. It seems this issue is found mostly on our servers running `4.15.0-66-generic`. Been upgrading the kernel and keeping an eye on the issue – Zoro Steve Jun 16 '22 at 19:36
  • Is your host running an ESXI? Because it handle thin-provisioning pretty bad and does not very well make freed memory from VM available to the host. – petitradisgris Jun 16 '22 at 21:53
  • No. The hosts are running KVM. – Zoro Steve Jun 17 '22 at 21:06