Currently I'm running a 8 server Ceph setup consisting out off 3 Ceph monitors and 5 Ceph nodes. Performance wise the cluster runs great but after time the nodes start swapping the ceph-osd
process to disk. When this happens I experience very pore performance and even the node that is swapping is sometimes seen as down by the cluster. Running swapoff -a
followed by swapon -a
temporary fixes the issue but in time it returns.
As I understand it is normal to run high in memory with Ceph due to caching and such, but memory is expected to be released and not to start swapping.
We tried the following:
- Double memory, just takes longer to experience the problem
- Update kernel, no result
- Looked at various settings within Ceph, didn't find solutions there
- Set swappiness to 1, no results just takes longer to experience the problem
- Searched for bugs, all bugs found where for old versions of Ceph
Has anyone an idea why this occurs and how to mediate this?
As our configurations stand each server has the following specification:
Operating System: CentOS 7
Memory: 32GB
OSD's: 6x 900Gb
Ceph version: 13.2.5 Mimic
Swappiness set to 1
Current memory when swapping occurs:
# free -m
total used free shared buff/cache available
Mem: 31960 19270 747 574 11943 11634
Swap: 2931 1500 1431
Swap dump:
PID=9 - Swap used: 0 - (rcu_bh )
PID=11077 - Swap used: 4 - (snmpd )
PID=9518 - Swap used: 4 - (master )
PID=7429 - Swap used: 8 - (systemd-logind )
PID=7431 - Swap used: 8 - (irqbalance )
PID=7465 - Swap used: 16 - (chronyd )
PID=7702 - Swap used: 20 - (NetworkManager )
PID=7469 - Swap used: 24 - (crond )
PID=7421 - Swap used: 132 - (dbus-daemon )
PID=1 - Swap used: 140 - (systemd )
PID=3616 - Swap used: 216 - (systemd-udevd )
PID=251189 - Swap used: 252 - (ceph-mds )
PID=7412 - Swap used: 376 - (polkitd )
PID=7485 - Swap used: 412 - (firewalld )
PID=9035 - Swap used: 524 - (tuned )
PID=3604 - Swap used: 1608 - (lvmetad )
PID=251277 - Swap used: 18404 - (ceph-osd )
PID=3580 - Swap used: 31904 - (systemd-journal )
PID=9042 - Swap used: 91528 - (rsyslogd )
PID=251282 - Swap used: 170788 - (ceph-osd )
PID=251279 - Swap used: 188400 - (ceph-osd )
PID=251270 - Swap used: 273096 - (ceph-osd )
PID=251275 - Swap used: 284572 - (ceph-osd )
PID=251273 - Swap used: 333288 - (ceph-osd )
/proc/meminfo:
MemTotal: 32694980 kB
MemFree: 2646652 kB
MemAvailable: 9663396 kB
Buffers: 7138928 kB
Cached: 545828 kB
SwapCached: 23492 kB
Active: 24029440 kB
Inactive: 5137820 kB
Active(anon): 19307904 kB
Inactive(anon): 2687172 kB
Active(file): 4721536 kB
Inactive(file): 2450648 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 3002364 kB
SwapFree: 2220284 kB
Dirty: 8 kB
Writeback: 0 kB
AnonPages: 21459096 kB
Mapped: 31508 kB
Shmem: 512572 kB
Slab: 338332 kB
SReclaimable: 271984 kB
SUnreclaim: 66348 kB
KernelStack: 11200 kB
PageTables: 55932 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 19349852 kB
Committed_AS: 29550388 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 378764 kB
VmallocChunk: 34342174716 kB
HardwareCorrupted: 0 kB
AnonHugePages: 90112 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 248704 kB
DirectMap2M: 5963776 kB
DirectMap1G: 27262976 kB