17

Most of the time when my computer starts to need swap, I see a massive spike in CPU usage (kswapd0 is consistently using 99%-100% CPU). According to top, the time is spent in sy (system/kernel) not wa (IO wait).

I am running Linux 4.0.4-2-ARCH on a C720 with 2GB RAM, and 6GB swap on an SSD.

I seem to have this problem with or without discard pages (TRIM) turned on.

Are there any setting I should inspect or tweak to see if I can fix this?

Is there any way to debug the problem? Something like strace for kernel threads?


Running with default Arch Linux settings:

/proc/sys/vm/swappiness = 60
/proc/sys/vm/vfs_cache_pressure = 100
/sys/kernel/mm/transparent_hugepage/enabled = [always] madvise never

Martin Rüegg
  • 182
  • 1
  • 7
Zaz
  • 783
  • 1
  • 6
  • 17
  • take a look if `irqbalance` is running, and see `/proc/interrupts` if interruptions are well balanced. – fgbreel Jun 02 '15 at 17:57
  • @fgbreel: irqbalance is not running. `/proc/interrupts` seems more or less balanced. The problem really only affects 1 CPU at a time. – Zaz Jun 02 '15 at 18:16
  • Install and start `irqbalance` service and keep watching to see if the time will be more distributed across the all CPU cores. – fgbreel Jun 02 '15 at 18:22

8 Answers8

15

It seems a relatively common problem

When the problem is happening, can you check if issuing the following command stops it: echo 1 > /proc/sys/vm/drop_caches

If it works, you can schedule it as a periodic cron job as a workaround.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • 2
    Unfortunately that has no effect, and `/proc/sys/vm/drop_caches` stays at `1`. – Zaz Jun 02 '15 at 21:21
  • 4
    This works and is very helpful, but **what** is it doing? I am on a very low resource machine and this issue comes up all the time (and makes it hard to use my computer over ssh) but I do not know what is going on. Scheduling this as a cron job seems like a said way to fix the glitch... – Startec Aug 31 '16 at 05:15
  • 1
    The above command drops the cached information that Linux pagecache subsystem keep to speedup I/O access. For example, when you read a file from the disks, it is also stored on the pagecache to speedup further access to file content. By dropping caches, you decrease memory pressure and this seems sufficient to avoid the `kswapd` problem. – shodanshok Aug 31 '16 at 09:19
  • This fixed the high CPU usage for me--night and day difference! kswapd0 went from 100% CPU to 0%. An explanation for why and a permanent solution would be great. (Side note: I'm running linux kernel 4.8.0-36-generic with 16 GB mem and 16 GB swap.) – josephdpurcell Jun 12 '17 at 17:13
  • See enhanced solution at https://askubuntu.com/a/736956/439867 – Peter Krauss Nov 06 '17 at 19:04
  • `kswapd0` tries to determine what makes more sense: dropping caches or swapping out processes. Usually, swapping out processes wins, as it gets rid of data that hasn't been accessed in ages in favor of data that has been accessed recently. If you drop caches, you communicate that it is more important to keep unused data in RAM and re-fetch data that is actually used from disk. That will show nice low CPU usage, because processes will be in iowait instead, but it's not useful behavior. – Simon Richter Sep 17 '21 at 16:46
3

I am not sure why this answer has not been suggested: killall -9 kswapd0

I came across this problem where the kswapd0 process was running as a non-root user who had not logged in for a long while. I have killed this process and the issue hasn't returned.

No, this does not address the root issue (how did it even get 100% in the first place) but allows you to quickly recover usage of the system.

reukiodo
  • 101
  • 2
  • 4
    I'm... pretty sure that's a rootkit disguising itself as a kernel process, not actual `kswapd0`. Pretty sure your user has been compromised. `kswapd0` does not run as regular user and can't be killed like that (I hope) – SharkWipf Sep 30 '20 at 17:58
  • This happens exactly with me and your solution solve my problem. Using 4.19.0-9-amd64 kernel. – Newton_Jose Oct 08 '20 at 06:51
  • @SharkWipf, that might be true. I've since removed that user as a precaution anyway since it was no longer needed. – reukiodo Nov 10 '20 at 23:11
3

I have a C720 running Linux Kernel 4.4.0 on Ubuntu 14.04.1 LTS with 2 GB RAM and 2 GB swap.

Assuming heavy Chrome/Chromium usage, here are some ways to make your system more performant:

  1. Edit /etc/default/grub and add the following kernel parameters to the GRUB_CMDLINE_LINUX_DEFAULT line:
    • elevator=noop
    • zswap.enabled=1
    • transparent_hugepage=madvise
  2. Run sudo update-grub2.
  3. Edit /etc/sysctl.conf and append the following:
  4. Reboot.

You can verify the changes like so:

$ dmesg | grep -i noop
[    0.694680] io scheduler noop registered (default)
$ dmesg | grep -i zswap
[    0.724855] zswap: loaded using pool lzo/zbud
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
$ sysctl vm.swappiness
vm.swappiness = 25
$ sysctl vm.vfs_cache_pressure
vm.vfs_cache_pressure = 1000

Update

Increasing vm.min_free_kbytes in step #3 may be beneficial. Try a value of 131072 (128 MB). The final takeaway is that Linux on the desktop doesn't perform very well in low-memory situations. Some have suggested placing Chrome/Chromium in a cgroup, but that's beyond the scope of this answer.

Matthew M
  • 31
  • 2
1

kswap kernel are used to allocate and release memory pages, if your swap are used in you see this kernel threads using so much cpu time, that would say the kswap kernel threads are scanning the memory pages for swaping some pages and serve the memory allocation request.

I think drop the cache in this case doesn't help, because the kernel reclaim the cache when the OS is tight memory situation automatically.

If you don't have any memory problem and use the free command, you will see so much memory used as cache, but if you have a memory problem, Linux reduce the cache for serve the memory allocation requests, without any need to drop cache

you can use sar -B and looking for majft and pgscank values, for other values man sar

c4f4t0r
  • 5,149
  • 3
  • 28
  • 41
0

None of the answers listed here were working for me. I tried a simpler 'IT Crowd' approach of turning off the swap and then turning it back on.

# swapoff -a then # swapon -a.

I wouldn't recommend this if the server was desperately short of RAM, but in my case there was plenty of RAM but for whatever reason swap was running amok. Swap cleared, CPU usage back to normal.

anorcross
  • 1
  • 1
  • 1
    The question is, if you ran ever out of memory, or have to change swappiness or a less swapping factor, i.e. only swap if really not possible like value 1 or 0 . As `A value of 0 instructs the kernel not to initiate swap until the amount of free and file-backed pages is less than the high water mark in a zone.` Reference: [kernel.org](https://www.kernel.org/doc/Documentation/sysctl/vm.txt#swappiness) – djdomi Jul 21 '21 at 09:41
0

For own experience kswapd# indicates memory is fully used and swap was activated automatically to accomplish its function of memory need backing . Kill the process is not a good decision because it becomes temporal not for future. I recomend to increase memory to machine to keep performance in processes. In my case is was due to have several virtual machines on same server. Two of the VM consume a lot of resources on its processes. Regards

0

(This is quasi-answer -- too long to be a comment, but not a ready answer though)

1) What about using not 6G but less, say 1 or 2 GiBs (you can set-up the size with mkswap w/o resizing swap partition) — tried? What results?

2) What's sysctl vm.swappiness, sysctl vm.vfs_cache_pressure?

3) What's cat /sys/kernel/mm/transparent_hugepage/enabled?

N. B. Do you realize you're gonna wear out your SSD significantly in that kind of set-up (not that much RAM, huge swap).

P. S. I could recommend trying to use UltraKSM but it requires patching a kernel. I do have some builds of mine own (-realtime and BFS based), but they're for .deb-based systems and meanwhile they can be used on different systems quite easily (usually you would need just to unpack the .debs and make corresponding initrd/initramfs, it can be a hassle for people not that familar with that side of Linux)

(to be continued)

poige
  • 9,171
  • 2
  • 24
  • 50
  • I added the information to my question. Arch Linux has an AUR package for building Linux with UltraKSM, so I might try that. Thanks for the help! – Zaz Jun 02 '15 at 18:57
  • 1
    Regarding SSDs, I think their fragility compared to HDDs is often exaggerated; even with flat out writing and erasing, it can take years to wear out an SSD. Even if I'm wrong and wear my SSD out, I'll should still be able to recover my data, and will only have lost a $30 SSD. – Zaz Jun 02 '15 at 18:58
  • ok, try changing hugepages → madvise, swappiness → 0, vfs_cache_pressure → 5000 – poige Jun 03 '15 at 00:54
  • Doesn't seem to make any difference. Sorry. – Zaz Jun 03 '15 at 15:43
  • Did you do 1st item in the list above? – poige Jun 03 '15 at 17:32
  • I was still experiencing the problem with swap disabled, so I assumed it would be the same for a small amount of swap. I'm trying it now with a 1GB swap partition, but I'm still seeing 100% CPU usage. – Zaz Jun 03 '15 at 18:53
0

If you have a service running inside of docker like puppeteer (chrome headless api) Inside your dockerfile add the dumb-init.

If running Docker >= 1.13.0 use docker run's --init arg to reap zombie processes

docker run container --init

If you are running a <=1.13.0 in docker use dumb-init. Add this to your Dockerfile.

ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64
/usr/local/bin/dumb-init
RUN chmod +x /usr/local/bin/dumb-init
ENTRYPOINT ["dumb-init", "--"]