Swap Memory Anomoly

Question

I am currently experiencing a memory issue on my centos 7.6 distro.

It began with my system swapping even though up to 80GB ram should have been available.

    free -m
              total        used        free      shared  buff/cache   available
Mem:         321931      239140        1291       79929       81498        1188
Swap:         30015       29681         334

The result prior was 0 swap free

Bear in mind, swapipiness is set to 10 so this behaviour should not occur in the first place.

df -h shows a lot of space taken up by devtmpfs (/dev) which should not be the case as it should be temporary memory in use.

~]# df -h
Filesystem              Size  Used Avail Use% Mounted on
/dev/mapper/vlmgrp1-OS   79G   51G   28G  65% /
devtmpfs                158G  100G   58G  64% /dev
tmpfs                   158G     0  158G   0% /dev/shm
tmpfs                   158G  4.0G  154G   3% /run
tmpfs                   158G     0  158G   0% /sys/fs/cgroup
/dev/nvme0n1p2         1014M  232M  783M  23% /boot
tmpfs                    32G     0   32G   0% /run/user/0
tmpfs                    32G     0   32G   0% /run/user/993

As you can see, /dev is using 100GB and the shared/buff/cache is holding 80GB physical RAM and not releasing it to the system.

I attempted to clear the cache first running sync; echo 1 | sudo tee /proc/sys/vm/drop_caches which freed 4GB. But this was taken back within 30 seconds. Then sync; echo 2 | sudo tee /proc/sys/vm/drop_caches and sync; echo 3 | sudo tee /proc/sys/vm/drop_caches which released nothing further.

swapoff -a && swapon -a also yielded no result and after 5 hours and a heavy load, swap still had 0 free.

 ~]# ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status

NO OUTPUT

~]# ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398442373116
min seg size (bytes) = 1

 ~]# cat /proc/meminfo
MemTotal:       329657664 kB
MemFree:         1817656 kB
MemAvailable:    1476420 kB
Buffers:           14968 kB
Cached:         81813132 kB
SwapCached:      1098308 kB
Active:         231698396 kB
Inactive:       90111876 kB
Active(anon):   231527424 kB
Inactive(anon): 90073660 kB
Active(file):     170972 kB
Inactive(file):    38216 kB
Unevictable:       25724 kB
Mlocked:           25724 kB
SwapTotal:      30736380 kB
SwapFree:          10668 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:      238909760 kB
Mapped:            57984 kB
Shmem:          81611272 kB
Slab:            1126844 kB
SReclaimable:     454628 kB
SUnreclaim:       672216 kB
KernelStack:       35296 kB
PageTables:       566588 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    195565212 kB
Committed_AS:   446505420 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     2154028 kB
VmallocChunk:   34189572924 kB
HardwareCorrupted:     0 kB
AnonHugePages:  44908544 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      816952 kB
DirectMap2M:    146685952 kB
DirectMap1G:    189792256 kB

~]# grep -R swap /usr/lib/tuned | grep swappiness
/usr/lib/tuned/latency-performance/tuned.conf:# The swappiness parameter controls the tendency of the kernel to move
/usr/lib/tuned/latency-performance/tuned.conf:vm.swappiness=10
/usr/lib/tuned/throughput-performance/tuned.conf:# The swappiness parameter controls the tendency of the kernel to move
/usr/lib/tuned/throughput-performance/tuned.conf:vm.swappiness=10
/usr/lib/tuned/virtual-guest/tuned.conf:vm.swappiness = 10

So, it seems the system should not have started swapping to this extent but yet, swap is full and my supposedly 80GB available RAM is non accessable. I turned my attention back to devtmpfs. What could possibly be using 100GB?

I think at this junction, I should mention this server is virtualized and partitioned. It uses LVM and has quite a few VMs on it. There are 5 main volume groups on it.

~]# vgscan
  Reading volume groups from cache.
  Found volume group "vg1" using metadata type lvm2
  Found volume group "vg2" using metadata type lvm2
  Found volume group "vg3" using metadata type lvm2
  Found volume group "vg" using metadata type lvm2
  Found volume group "nvmessd1" using metadata type lvm2

I went in search of what was using 100GB in /dev and found this

~]# du -h /dev
0       /dev/system
0       /dev/pve
0       /dev/centos
0       /dev/vg
0       /dev/vg3
100G    /dev/vg2
0       /dev/vg1
0       /dev/nvmessd1
0       /dev/vfio
0       /dev/snd
0       /dev/net
0       /dev/mqueue
0       /dev/hugepages/libvirt/qemu
0       /dev/hugepages/libvirt
0       /dev/hugepages
0       /dev/vlmgrp1
0       /dev/disk/by-label
0       /dev/disk/by-partuuid
0       /dev/disk/by-partlabel
0       /dev/disk/by-uuid
0       /dev/disk/by-path
0       /dev/disk/by-id
0       /dev/disk
0       /dev/block
0       /dev/bsg
0       /dev/dri
0       /dev/char
0       /dev/mapper
0       /dev/pts
0       /dev/shm
0       /dev/input/by-path
0       /dev/input/by-id
0       /dev/input
0       /dev/bus/usb/002
0       /dev/bus/usb/001
0       /dev/bus/usb
0       /dev/bus
0       /dev/raw
0       /dev/cpu/23
0       /dev/cpu/22
0       /dev/cpu/21
0       /dev/cpu/20
0       /dev/cpu/19
0       /dev/cpu/18
0       /dev/cpu/17
0       /dev/cpu/16
0       /dev/cpu/15
0       /dev/cpu/14
0       /dev/cpu/13
0       /dev/cpu/12
0       /dev/cpu/11
0       /dev/cpu/10
0       /dev/cpu/9
0       /dev/cpu/8
0       /dev/cpu/7
0       /dev/cpu/6
0       /dev/cpu/5
0       /dev/cpu/4
0       /dev/cpu/3
0       /dev/cpu/2
0       /dev/cpu/1
0       /dev/cpu/0
0       /dev/cpu
100G    /dev

To me, it looks like /dev/vg2 is actually using the swap memory. How is this possible?

I am not exactly sure what is going on here and have never witnessed such behavior. I would prefer to restore swap and some RAM without a reboot but is there a way this is possible as I am currently at a loss?.

Thanks.

EDIT

pvs also has a strange error which I can only guess relates to this issue and vg2 not being in the correct place.

~]# pvs
  Error reading device /dev/centos/root at 0 length 512.
  Error reading device /dev/centos/root at 0 length 4.
  Error reading device /dev/centos/root at 4096 length 4.
  Error reading device /dev/system/var at 0 length 512.
  Error reading device /dev/system/var at 0 length 4.
  Error reading device /dev/system/var at 4096 length 4.
  Error reading device /dev/system/tmp at 0 length 512.
  Error reading device /dev/system/tmp at 0 length 4.
  Error reading device /dev/system/tmp at 4096 length 4.
  Error reading device /dev/system/swap at 0 length 512.
  Error reading device /dev/system/swap at 0 length 4.
  Error reading device /dev/system/swap at 4096 length 4.
  Error reading device /dev/system/backup at 0 length 512.
  Error reading device /dev/system/backup at 0 length 4.
  Error reading device /dev/system/backup at 4096 length 4.
  PV                                                           VG       Fmt  Attr PSize   PFree
  /dev/mapper/vg2-vsv1685--dsakekjloo2ddm0a--eahin7pr71l0fwlc2 vg       lvm2 a--  <99.88g       0
  /dev/nvme0n1p3                                               vg3  lvm2 a--    1.86t  651.28g
  /dev/nvme1n1                                                 nvmessd1 lvm2 a--    1.86t <555.72g
  /dev/sda1                                                    vg1      lvm2 a--   <9.10t    4.04t
  /dev/sdb1                                                    vg2      lvm2 a--   <9.10t    3.74t

As you can see, vg2 ia just a volume group residing on sdb disk in partition number 1 (Whole disk) which is a 10TB storage.

score 4 · Accepted Answer · answered Apr 29 '19 at 18:42

Review the physical volumes with pvs. If you see one under /dev/vg2 you have used the /dev file system in shared memory as a disk. Migrate off this immediately with pvmove if you care about your data surviving the next reboot.

To avoid this in the future, only create and extend VGs with disk devices, such as an existing device under /dev/disk/. Further, when working on logical volumes, you don't need the leading /dev, so lvextend vg2/lv3.

/proc/sys/vm/drop_caches is only useful for cold caches in performance benchmarks. Do not bother trying to use it for operations.

Bear in mind, swapipiness is set to 10 so this behaviour should not occur in the first place.

Why would you having paging space if it wasn't to be used? Committed_AS is 135% of your total memory, it is going to page out.

Certainly the about 100 GB is suspect. If you don't intend to configure shared memory (likely for databases), it is incorrectly configured. If you do, huge pages would improve efficiency.

I have added the output of pvs above with all its errors also. — GreyStone, Apr 29 '19 at 22:51
You have 6 (?) VGs, and I have no idea about the state of them. `pvremove /dev/mapper/vg2-vsv1685--dsakekjloo2ddm0a--eahin7pr71l0fwlc2` and remove its file if necessary. Then you can review if LVM needs to be cleaned up further. — John Mahowald, Apr 30 '19 at 12:37

Swap Memory Anomoly

1 Answers1