I have two servers in a pool with Nginx, PHP5-FPM and Memcached. For some reason, the first server in the pool seems to inexplicably lose about 2GB of RAM. I can't explain where it's going.
A reboot gets everything back to normal, but after a few hours the RAM is used again.
At first I thought it was down to memcached, but eventually I'd killed every process I could reasonably kill and the memory was not released. Even init 1 did not free the memory.
ipcs -m is empty and slabtop looks much the same on this and the server in the pool which is using very little memory.
df shows about 360K in tmpfs
In case it's relevant, the two servers are nearly identical in that they are both running the same OS at the same level of updates on the same hypervisor (VMWare ESXi 4.1) on different hosts but with identical hardware. The differences are that:-
- The first server has an NFS mount. I tried unmounting this and removing the modules but no change to RAM usage
- The first server listens for HTTP and HTTPS sites while the second only listens for HTTP.
Here's the output of free -m ...
total used free shared buffers cached
Mem: 3953 3458 494 0 236 475
-/+ buffers/cache: 2746 1206
Swap: 1023 0 1023
Here's /proc/meminfo ...
MemTotal: 4048392 kB
MemFree: 506576 kB
Buffers: 242252 kB
Cached: 486796 kB
SwapCached: 8 kB
Active: 375240 kB
Inactive: 369312 kB
Active(anon): 12320 kB
Inactive(anon): 3596 kB
Active(file): 362920 kB
Inactive(file): 365716 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 1048572 kB
SwapFree: 1048544 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 15544 kB
Mapped: 3084 kB
Shmem: 412 kB
Slab: 94516 kB
SReclaimable: 75104 kB
SUnreclaim: 19412 kB
KernelStack: 632 kB
PageTables: 1012 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3072768 kB
Committed_AS: 20060 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 281340 kB
VmallocChunk: 34359454584 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 59392 kB
DirectMap2M: 4134912 kB
Here's the process list at the time ...
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 24336 2160 ? Ss Jul22 0:09 /sbin/init
root 2 0.0 0.0 0 0 ? S Jul22 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Jul22 0:38 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S Jul22 0:00 [kworker/u:0]
root 6 0.0 0.0 0 0 ? S Jul22 0:04 [migration/0]
root 7 0.0 0.0 0 0 ? S Jul22 0:32 [watchdog/0]
root 8 0.0 0.0 0 0 ? S Jul22 0:04 [migration/1]
root 10 0.0 0.0 0 0 ? S Jul22 0:22 [ksoftirqd/1]
root 11 0.0 0.0 0 0 ? S Jul22 0:15 [kworker/0:1]
root 12 0.0 0.0 0 0 ? S Jul22 0:31 [watchdog/1]
root 13 0.0 0.0 0 0 ? S Jul22 0:04 [migration/2]
root 15 0.0 0.0 0 0 ? S Jul22 0:04 [ksoftirqd/2]
root 16 0.0 0.0 0 0 ? S Jul22 0:14 [watchdog/2]
root 17 0.0 0.0 0 0 ? S Jul22 0:04 [migration/3]
root 19 0.0 0.0 0 0 ? S Jul22 0:04 [ksoftirqd/3]
root 20 0.0 0.0 0 0 ? S Jul22 0:11 [watchdog/3]
root 21 0.0 0.0 0 0 ? S< Jul22 0:00 [cpuset]
root 22 0.0 0.0 0 0 ? S< Jul22 0:00 [khelper]
root 23 0.0 0.0 0 0 ? S Jul22 0:00 [kdevtmpfs]
root 24 0.0 0.0 0 0 ? S< Jul22 0:00 [netns]
root 25 0.0 0.0 0 0 ? S Jul22 0:02 [sync_supers]
root 26 0.0 0.0 0 0 ? S Jul22 0:21 [kworker/u:1]
root 27 0.0 0.0 0 0 ? S Jul22 0:00 [bdi-default]
root 28 0.0 0.0 0 0 ? S< Jul22 0:00 [kintegrityd]
root 29 0.0 0.0 0 0 ? S< Jul22 0:00 [kblockd]
root 30 0.0 0.0 0 0 ? S< Jul22 0:00 [ata_sff]
root 31 0.0 0.0 0 0 ? S Jul22 0:00 [khubd]
root 32 0.0 0.0 0 0 ? S< Jul22 0:00 [md]
root 34 0.0 0.0 0 0 ? S Jul22 0:04 [khungtaskd]
root 35 0.0 0.0 0 0 ? S Jul22 0:15 [kswapd0]
root 36 0.0 0.0 0 0 ? SN Jul22 0:00 [ksmd]
root 37 0.0 0.0 0 0 ? SN Jul22 0:00 [khugepaged]
root 38 0.0 0.0 0 0 ? S Jul22 0:00 [fsnotify_mark]
root 39 0.0 0.0 0 0 ? S Jul22 0:00 [ecryptfs-kthrea]
root 40 0.0 0.0 0 0 ? S< Jul22 0:00 [crypto]
root 48 0.0 0.0 0 0 ? S< Jul22 0:00 [kthrotld]
root 50 0.0 0.0 0 0 ? S Jul22 2:59 [kworker/1:1]
root 51 0.0 0.0 0 0 ? S Jul22 0:00 [scsi_eh_0]
root 52 0.0 0.0 0 0 ? S Jul22 0:00 [scsi_eh_1]
root 57 0.0 0.0 0 0 ? S Jul22 0:09 [kworker/3:1]
root 74 0.0 0.0 0 0 ? S< Jul22 0:00 [devfreq_wq]
root 114 0.0 0.0 0 0 ? S Jul22 0:00 [kworker/3:2]
root 128 0.0 0.0 0 0 ? S Jul22 0:00 [kworker/1:2]
root 139 0.0 0.0 0 0 ? S Jul22 0:00 [kworker/0:2]
root 249 0.0 0.0 0 0 ? S< Jul22 0:00 [mpt_poll_0]
root 250 0.0 0.0 0 0 ? S< Jul22 0:00 [mpt/0]
root 259 0.0 0.0 0 0 ? S Jul22 0:00 [scsi_eh_2]
root 273 0.0 0.0 0 0 ? S Jul22 0:20 [jbd2/sda1-8]
root 274 0.0 0.0 0 0 ? S< Jul22 0:00 [ext4-dio-unwrit]
root 377 0.0 0.0 0 0 ? S Jul22 0:26 [jbd2/sdb1-8]
root 378 0.0 0.0 0 0 ? S< Jul22 0:00 [ext4-dio-unwrit]
root 421 0.0 0.0 17232 584 ? S Jul22 0:00 upstart-udev-bridge --daemon
root 438 0.0 0.0 21412 1176 ? Ss Jul22 0:00 /sbin/udevd --daemon
root 446 0.0 0.0 0 0 ? S< Jul22 0:00 [rpciod]
root 448 0.0 0.0 0 0 ? S< Jul22 0:00 [nfsiod]
root 612 0.0 0.0 21408 772 ? S Jul22 0:00 /sbin/udevd --daemon
root 613 0.0 0.0 21728 924 ? S Jul22 0:00 /sbin/udevd --daemon
root 700 0.0 0.0 0 0 ? S< Jul22 0:00 [kpsmoused]
root 849 0.0 0.0 15188 388 ? S Jul22 0:00 upstart-socket-bridge --daemon
root 887 0.0 0.0 0 0 ? S Jul22 0:00 [lockd]
root 919 0.0 0.0 14504 952 tty4 Ss+ Jul22 0:00 /sbin/getty -8 38400 tty4
root 922 0.0 0.0 14504 952 tty5 Ss+ Jul22 0:00 /sbin/getty -8 38400 tty5
root 924 0.0 0.0 14504 944 tty2 Ss+ Jul22 0:00 /sbin/getty -8 38400 tty2
root 925 0.0 0.0 14504 944 tty3 Ss+ Jul22 0:00 /sbin/getty -8 38400 tty3
root 930 0.0 0.0 14504 952 tty6 Ss+ Jul22 0:00 /sbin/getty -8 38400 tty6
root 940 0.0 0.0 0 0 ? S Jul22 0:07 [flush-8:0]
root 1562 0.0 0.0 58792 1740 tty1 Ss Jul22 0:00 /bin/login --
root 12969 0.0 0.0 0 0 ? S 07:18 0:02 [kworker/2:2]
root 30051 0.0 0.0 0 0 ? S 10:13 0:00 [flush-8:16]
root 30909 0.0 0.0 0 0 ? S 10:14 0:00 [kworker/2:1]
johncc 30921 0.2 0.2 26792 9360 tty1 S 10:17 0:00 -bash
root 31089 0.0 0.0 0 0 ? S 10:18 0:00 [kworker/0:0]
root 31099 0.0 0.0 42020 1808 tty1 S 10:19 0:00 sudo -i
root 31100 0.2 0.1 22596 5168 tty1 S 10:19 0:00 -bash
root 31187 0.0 0.0 0 0 ? S 10:19 0:00 [kworker/2:0]
root 31219 0.0 0.0 16880 1252 tty1 R+ 10:22 0:00 ps aux
root 31220 0.0 0.0 53924 536 tty1 R+ 10:22 0:00 curl -F sprunge=<- http://sprunge.us
Can anyone suggest what to try next, or how to debug this problem? I'm at a loss!