40

I have a production host, below:

htop

The system is using 1GB of swap, while maintaining nearly 40GB of free, unused memory space. Should I be concerned about this, or is it mostly normal?

MrDuk
  • 815
  • 1
  • 10
  • 18
  • 23
    Actually, you should be concerned about a production host with real load wasting nearly 40GB of memory. Surely it could find some use to put that memory to -- the applications are accessing the disks, couldn't it use that memory to cache some of that data, reduce I/Os, and improve its performance? Why is 40GB of memory being wasted on a machine that's doing work? That's what you should be concerned about. That's not normal. – David Schwartz Jan 12 '17 at 17:21
  • 25
    It really would be more useful if you showed us the output of `free -m`. The graphics are hard to read. – user9517 Jan 12 '17 at 17:23
  • @DavidSchwartz -- I have a related question that's still active on just that. http://serverfault.com/questions/825909/how-conservative-should-i-be-when-considering-current-cache-use-on-a-system – MrDuk Jan 12 '17 at 17:25

5 Answers5

68

This is not a problem and is likely normal. Lots of code (and possibly data) is used very rarely so the system will swap it out to free up memory.

Swapping is mostly only a problem if memory is being swapped in and out continuously. It is that kind of activity that kills performance and suggests a problem elsewhere on the system.

If you want to monitor your swap activity you can with several utilities but vmstat is usually quite useful e.g.

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 348256  73540 274600    0    0     1     9    9    6  2  0 98  0  0
 0  0      0 348240  73544 274620    0    0     0    16   28   26  0  0 100  0  0
 0  0      0 348240  73544 274620    0    0     0     0   29   33  0  0 100  0  0
 0  0      0 348240  73544 274620    0    0     0     0   21   23  0  0 100  0  0
 0  0      0 348240  73544 274620    0    0     0     0   24   26  0  0 100  0  0
 0  0      0 348240  73544 274620    0    0     0     0   23   23  0  0 100  0  0

Ignore the first line as that is activity since the system started. Note the si and so columns under ---swap--; they should generally be fairly small figures if not 0 for the majority of the time.

Also worth mentioning is that this preemptive swapping can be controlled with a kernel setting. The file at /proc/sys/vm/swappiness contains a number between 0 and 100 that tells the kernel how aggressively to swap out memory. Cat the file to see what this is set to. By default, most Linux distros default this to 60, but if you don't want to see any swapping before memory is exhausted, echo a 0 into the file like this:

echo 0 >/proc/sys/vm/swappiness

This can be made permanent by adding

vm.swappiness = 0

to /etc/sysctl.conf.

rav_kr
  • 105
  • 2
user9517
  • 114,104
  • 20
  • 206
  • 289
  • 14
    Also worth mentioning is that this preemptive swapping can be controlled with a kernel setting. The file at /proc/sys/vm/swappiness contains a number between 0 and 100 that tells the kernel how aggressively to swap out memory. Cat the file to see what this is set to. By default, most Linux distros default this to 60, but if you don't want to see any swapping before memory is exhausted, echo a 0 into the file like this: `echo 0 >/proc/sys/vm/swappiness`. This can be made permanent by adding `vm.swappiness = 0` to /etc/sysctl.conf. – virtex Jan 12 '17 at 19:19
  • @virtex: I like to use swappiness = 1, or just something less than 10, on my desktop. That might do well on servers most of the time, too. Strongly discourage swapping to free up RAM for more pagecache, without prohibiting it entirely. – Peter Cordes Jan 13 '17 at 08:46
  • 1
    @PeterCordes Take care for servers, especially those accessing databases or serving files. Those may benefit a lot from the memory becoming available for file caches. – Jonas Schäfer Jan 13 '17 at 13:02
  • 4
    @JonasWielicki: Even with `swappiness=7` or something, long-term unused pages do get swapped out. There's a big difference between `swappiness=0` and any other value, even low values. The kernel-default `swappiness=60` is generally good for servers, and it's only for desktop interactive use where a low swappiness is good. But setting it to 7 or something shouldn't hurt much. (But I haven't checked, I'm not a server sysadmin). – Peter Cordes Jan 13 '17 at 17:33
  • 2
    @PeterCordes Until you put memory pressure, any `swappiness` works great. With the pressure, you will see that `swappiness=7` starves the file cache *almost completely* for an extended period of time, while `swappiness=60` liquidates a lot of cache but also starts to swap out within seconds. It's still the cache that takes the beating, but in a much more balanced way. – kubanczyk Jan 14 '17 at 08:06
25

Linux will pre-emptively write out pages to disk if it has nothing better to do. That does not mean that it will evict those pages from memory, though. It's just that in case it must evict those pages sometime in the future, it doesn't need to wait for them to be written to disk, because they are already there.

After all, the reason you are running out of memory, is probably because your machine is working hard already, you don't want to additionally burden it with swapping. Better to do the swapping when the machine is doing nothing.

For a similar reason, your memory should always be full. Memory pages, filesystem cache, tmpfs, there's so much stuff that could be held in memory. Really, you should be concerned if your memory is empty; after all, you paid a lot of money for it (at least compared to the same amount of disk space), so it better be used!

Jörg W Mittag
  • 2,108
  • 1
  • 18
  • 17
  • Jorg, the pages the kernel preemptively write to disks are not swap pages, are dirty disk cache pages. The vm.dirty_background_... tunnables control that. The swap out activity starts according to the swapiness tunnable and does not wait for idle times. – Lucas Jan 15 '17 at 16:43
11

Swap used is not bad, but a lot of swap activity is

  vmstat 1
  procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
  6  0 521040 114564   6688 377308    8   13   639   173    0 1100  5  4 90  0
  1  0 521040 114964   6688 377448    0    0   256     0    0 1826  3  4 94  0
  0  0 521040 115956   6688 377448    0    0     0     0    0 1182  7  3 90  0
  0  0 521036 115992   6688 377448    4    0    16     0    0 1154 10  2 88  0
  3  0 521036 114628   6696 377640    0    0   928   224    0 1503 15 17 67  1

The column swapd is no problem at all. Non zero values on the columns si and so are deadly to server performance. Specially the ones with lots of RAM.

It is best to disable swapinness on machines with several GB of ram:

sysctl -w vm.swappiness=0

This will not disable the swap. It will only instruct Linux to use the swap as last resort measure. This will waste a few MB of programs that does not need to be in RAM... But is preferable to swap bloating your disk access queues.

Edit 1: why the default value of swappiness is not optimal

We got to remember two decades ago a big 486 had only 32Mb RAM. Swap algorithms was developed when the whole RAM could be moved to the disk in a small fraction of second. Even with the slower disks of that time. That is why default swap policies are so aggressive. RAM was the bottleneck those days. Since then RAM size increased more than 10,000 times and disk speeds less than 10 times. This shifted the bottleneck to disk bandwidth.

Edit 2: why si so activity is deadly to servers?

Si and so activity on machines with tons of RAM is deadly because means the system is fighting with itself for RAM. What happens is that disks, even big storages are too slow when compared to RAMs. Aggressive swap favors kernel disk cache over application data and is the most common source of fighting for RAM. Since the OS will have to free disk cache on every si, the time to live of the extra cache that swap provides is too low to be useful anyways. The result is that you are taking disk bandwidth to store cache that is probably will not be used and pausing your programs waiting for the si pages. Meaning that consumes a lot of critical resources with little or no benefit to the applications.

Note the title of the response "a lot of swap activity on servers with lots of RAM". This does not applies to machines with occasional si and so activity. This may not apply in the future if smarter swap algorithms are developed in the OSs.

Edit 3: "cold" pages

People romanticize the swapping algorithm. Some say "it takes less used pages of the RAM", but this is not what the kernel does at all. The thing is difficult to understand about swap is the kernel does not know what a "cold page" is. The kernel does not have a good metric to determine if the page is used or likely to be used in the near future. To circumvent that the kernel puts pages in the swap more or less randomly and pages that are not needed stays there. The problem of that algorithm is that pages need to go to the swap to know if they are needed by the applications. And this mean a lot of "hot" pages will go to the swap. The problem with that is disks are too damn slow compared to RAM. The consequence of that is when swapping starts all applications get random pauses waiting for the disks and this hinders on latency and throughput.

I built my own benchmark that is a realistic scenario very common to many applications with a decent volume. From my tests, I saw no benefits on throughput or latency when swaps are in use. Far from it. When swapping starts it slow down both throughput and latency by at least a order of magnitude.

I go a bit further about this: I understand swap is not for processing. Swaps are for emergencies only. Those moments when too much applications are running at the same time and you get a memory spike. Without swap this would cause out-of-memory errors. I consider swap usage a failure of the development and production teams. This is just an opinion that goes way beyond what we discussed here, but is what I think. Of course my applications have excellent memory management by themselves.

Lucas
  • 513
  • 3
  • 10
  • 9
    "Best to disable swapinness" Best, why? (Best, for what purpose?) The default might not be right for all uses, but I'd still need a reason to change it. – jpaugh Jan 12 '17 at 22:24
  • 3
    How is `si` more deadly to your server than `bi`? Both mean some program is waiting for 4096 bytes to be read from disk to memory. The `bi` is from any file, and `si` from a specific narrow category of files (but their bytes move *just as fast* through exactly the same path). – kubanczyk Jan 13 '17 at 07:56
  • 2
    A 486 with 128MB of ram was very rare and would have been considered a mainframe or supercomputer - thus the CPU would have unlikely been 486. My old 486 had 4MB of RAM and I was envious of my friend's machine with 16MB of ram (large servers had 16 to 32 MB of RAM). Fast forward to Pentiums and we start seeing 8 to 16 MB as the normal. When Pentium3 first appeared (when CPUs started normally exceeding 1GHz) 32MB was normal and web servers typically had 64 to 128MB. – slebetman Jan 14 '17 at 03:21
  • `swappiness=0` seems totally inappropriate for servers. You might consider it for an interactive desktop system (but even then, `swappiness=1` is a better choice to eventually swap out really cold pages). See [comments on another answer](http://serverfault.com/questions/825911/should-i-be-concerned-that-swap-is-being-used-on-a-host-with-nearly-40gb-of-free/825915?noredirect=1#comment1053967_825915). `swappiness=7` or something will reduce swap activity dramatically without pinning cold pages into RAM until OOM, and is worth considering if you think `60` is too swappy for a specific server. – Peter Cordes Jan 14 '17 at 17:06
  • 1
    @kubanczyk: I think `si` is worse than `bi`. Most server software is designed around the assumption that I/O from disk may be slow, and uses threads, async I/O, or some other technique to remain responsive in general while waiting on I/O. A page fault could happen anywhere. In the worst case, a slow page fault could happen after taking a lock, blocking all other threads from entering that critical section for ~10ms (with swap on slow rotational storage). That might be plausible if a critical section copies data from a shared data structure to a potentially-cold page. – Peter Cordes Jan 14 '17 at 17:16
  • Before/after messing with `swappiness`, I'd highly recommend doing benchmarks to measure average and worst-case responsiveness of whatever your server runs. I think a blanket `swapiness=0` for machines with more than a couple GB of RAM is totally inappropriate for servers. If it was, Linux would already do that, or some distros would. Modern software often "wastes" RAM, because it's cheap and virtual memory paging makes it not much of a burden on the whole system to fail to deallocate some pages that are only used at process startup. (Or more commonly, startup and shutdown). – Peter Cordes Jan 14 '17 at 17:19
  • (And measure throughput / whatever else is relevant, not just response latency). – Peter Cordes Jan 14 '17 at 17:29
8

This is not an answer for your question; but rather, just extra information to help you make an informed decision.

If you would like to know what processes specifically are using how much swap, here is a little shell script:

#!/bin/bash

set -o posix
set -u

OVERALL=0
for DIR in `find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+"` ; do
  PID=`echo $DIR | cut -d / -f 3`
  PROGNAME=`ps -p $PID -o comm --no-headers`

  SUM=0
  for SWAP in `grep Swap $DIR/smaps 2>/dev/null| awk '{ print $2 }'` ; do
    let SUM=$SUM+$SWAP
  done
  echo "PID=$PID - Swap used: $SUM - ($PROGNAME )"

  let OVERALL=$OVERALL+$SUM
done
echo "Overall swap used: $OVERALL"

I should also add that tmpfs will also swap out. This is more common on modern linux systems using systemd that create user-space /tmp overlays using tmpfs.

ruakh
  • 217
  • 2
  • 9
Aaron
  • 2,809
  • 2
  • 11
  • 29
  • Nice script. Take a look at smem too. – user9517 Jan 12 '17 at 20:50
  • I think you could write that a lot more efficiently (*far* fewer forked processes) with `awk '/Swap/ {sw += $2} FNR==1 { /*first line of a new file */ find the command somehow, maybe still fork/exec ps;} END { print totals }' /proc/[0-9]*/smaps`. That runs cut and ps for every process, and grep+awk several times for every process in the system. – Peter Cordes Jan 14 '17 at 17:28
0

I've noticed MySQL Cluster replication slow down or fail when the agents are swapping heavily. Maybe some applications don't mind or maybe even benefit from some swapping but databases really seem to suffer from it. However many discussions I've seen on forums discuss swap decontextualized from the specific work load discussion.

In the DBA world the consensus seems to be that "It’s common sense that when you’re running MySQL (or really any other DBMS) you don’t want to see any I/O in your swap space. Scaling the cache size (using innodb_buffer_pool_size in MySQL’s case) is standard practice to make sure there is enough free memory so swapping isn’t needed.

But what if you make some mistake or miscalculation, and swapping happens? How much does it really impact performance? This is exactly what I set out to investigate. "

I hope readers will find the following links apropos.

https://www.percona.com/blog/2017/01/13/impact-of-swapping-on-mysql-performance/

https://www.percona.com/blog/2010/01/18/why-swapping-is-bad-for-mysql-performance/

  • 1
    Welcome to Server Fault! Whilst this may theoretically answer the question, [it would be preferable](//meta.stackoverflow.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – Frederik Mar 24 '17 at 15:40