I have an idle Linux centOS system and yet kswapd is using 100% cpu.
All I have running is a single bash session with top running.... I have 32G RAM and yet kswapd is constantly using 100% cpu for over 4 hours.
I have an idle Linux centOS system and yet kswapd is using 100% cpu.
All I have running is a single bash session with top running.... I have 32G RAM and yet kswapd is constantly using 100% cpu for over 4 hours.
AFAICS this is neither related to free RAM nor SWAP. We have the same problem here which sometimes hits production machines and there is plenty of RAM free, quite often more than 700 MB with no dirty buffers to sync and 0 bytes SWAP used. It definitively looks like a severe Kernel BUG due to some unknown race condition.
Currently we run CentOS Kernel 2.6.18-194.el5 and will try to replace it by some newer kernel, because we think, this might help.
Update:
RedHat had confirmed that it is a kernel issue for 2.6.18-194.el5
Solutions:
Minimum: kernel-2.6.18-194.32.1.el5 contains the immediate bugfix Better: kernel-2.6.18-238.el5 contains additional kswapd-related bugfixes Best: kernel-2.6.18-348.4.1.el5 latest kernel which runs with RHEL 5.5 without change
In the meanwhile there is a script, which is able to detect the 100% CPU situation quite well. It is called by our monitoring each minute to inform us about the situation. If the situation stays for too long, affected machines would lock up completely due to more and more unkillable processes using 100% CPU, until the machine becomes completely unmanageable.
Currently the only way known to solve the problem is to manually hard reboot the affected machine. /sbin/reboot
fails, because the machine hangs on shutdown quite too often.
To hard-reboot a machine from any root shell commandline without direct access to Console do:
echo 10 > /proc/sys/kernel/panic
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
sleep 5
echo s > /proc/sysrq-trigger
sleep 1
echo b > /proc/sysrq-trigger
Keep in mind, do this after quiescing the machine, such that there is no more process writing to the disks. This shall prevent that fsck
runs in severe trouble after reboot.
Sorry, no real solution, but HTH. And keep in mind, perhaps there might be other things which cause a 100% CPU situation on kswapd than described here. So automating a reboot in this case perhaps is a bad idea.