1

I have a problem with my Debian Servers. We run 4 different server which all have Intel CPUs and 128GB of RAM. Two of them run Wheezy, two of them run Jessie. We run a Java software on those systems which is heavily using memory and could eat up all memory.

For those cases I installed a swap partition on every server which is held on a RAID 1 running on 2 SSDs.

Problem with the Jessie systems: when the system nearly runs out of memory it starts swapping. This is tuned by the vm.swappiness = 10 parameter and looks ok to me. But the swapping itself is done so heavily, that the system totally hangs/freezes. There is so much disk io done that the system is not responding anymore.

I did some tests on all systems an artificially filled up the RAM to 120% by using:

stress --vm-bytes $(awk '/MemFree/{printf "%d\n", $2 * 1.2;}' < /proc/meminfo)k --vm-keep -m 1

The system start swapping and freezes while the swapping of the 20% is running. After ~20s the system is back and usable again but during the freeze nothing works anymore.

Of course this behaviour is not acceptable for a productive system. What I would expect is that swapping has a high priority but should never use more than 90% of all system resource so that the system still can be handled somehow.

Tuning the swappiness to different values didn't help..

We're using the following kernels:

Wheezy: Linux A 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 GNU/Linux

Jessie: Linux B 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) x86_64 GNU/Linux

Did anyone run into the same problem and found a solution?

Edit: Thank you all for the comments and explanations. Of course I don't want to use swap as spare memory. The 120% usage was just a test. In production, the systems uses maybe 100,0001% of the memory and already stops being responsive. In the production mode with our software running there is also a high frequency of changing data so that the system might be busy by just swapping a very small amount of data back and forth the whole time.

mr.simonski
  • 225
  • 3
  • 12
  • 4
    Swap isn't spare memory. Dont use it as such. – Matthew Ife Mar 08 '16 at 13:59
  • Agreed with Matthew. Using a *bit* of swap is expected, and is not a bad thing. If your system starts using large amounts, though, it's a sign that you either need more RAM or your application needs to be reconfigured to limit its RAM usage to less that what is available. – EEAA Mar 08 '16 at 14:03
  • If you use 120% of memory, then 1 in 6 page accesses will result in a fault. Which will mean suspension of the process, a context switch to kernel memory, an invocation of the page replacement subsystem, and the initiation of an I/O transfer. This is exacerbated by the fact that drive memory will always be orders of magnitude slower than RAM, and *to compensate* will load larger blocks of memory. A RAID subsystem means additional delay for I/O operations. Swap isn't meant to be fast; it's meant to be a last resort when the alternative is crashing. – Parthian Shot Mar 08 '16 at 14:10
  • did you solve it? I have exactly the same problem, even though there is plenty of free memory. It seems to be a trashing issue, but cant figure it out. – ronu Apr 13 '20 at 05:44
  • 128 GB of RAM for an application is a lot, you need to check with your app team the memory management of the application. – c4f4t0r Apr 14 '20 at 08:15

2 Answers2

0

We're still facing this issue with our Java applications even on servers with the current Debian Buster OS releases.

What we did to prevent it: we add to the end of

/etc/sysctl.conf

the config parameter

vm.swappiness = 0

Until the system doesn't really really need it the swap isn't used. Besides that we make sure to configure our Java app to only used a max. amount of memory.

mr.simonski
  • 225
  • 3
  • 12
  • vm.swappiness=0 completely disables the swap since 3.x kernels. To minimize swapping but still leave it available as a last resort, use vm.swappiness=1. – Gordan Bobić Apr 14 '20 at 10:29
  • Additional information request. # cores, any SSD or NVME devices on MySQL Host server? Post on pastebin.com and share the links. From your SSH login root, Text results of: B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; E) complete MySQLTuner report AND Optional very helpful information, if available includes - htop OR top for most active apps, ulimit -a for a Linux/Unix list of limits, iostat -xm 5 3 for IOPS by device and core/cpu count, for server workload tuning analysis to provide suggestions. – Wilson Hauck Apr 16 '20 at 17:36
0

There are three options you may wish to consider:

1) Tune up your application's memory usage to not exceed the available memory in the system and disable swapping entirely. I only configure systems with swap under very unusual circumstances. If your server has more than one NUMA node, look at your biggest memory consumer's configuration and look for NUMA related options. If there aren't any, use numactl to set the process' memory to interleave between the nodes. Google for "mysql swapping insanity" for more details a out why NUMA can cause unusual swapping and OOM conditions even when plenty of memory is available.

2) Set swappiness=100. This will make the kernel swap out pages at the first sign of pressure. This can cause swapping to happen more often but in smaller increments and thus take edge of the system grinding to a halt for a long time.

3) Configure your swap on zram with lz4 compression. It is far faster than swapping onto spinning rust or even SATA SSD (probably slower than modern NVMe, though). Make sure you configure the zram size to less than the amount of available memory after deducting any memory reserved for huge pages. For example, if you have 128GB of RAM and you have 64GB of huge pages reserved, configure zram for, say 60GB. It is dynamically allocated and freed and 0-filled pages (you'd be surprised how many of those there are in working memory) get outright discarded.

Gordan Bobić
  • 936
  • 4
  • 10