High swap usage on CentOS server with plenty of spare RAM

Question

We have a fleet of web hosting servers all running the exact same software & hardware with similar load.

One of our server gets very high load that leads to it locking up for brief periods of time. The main difference for this server seems to be very high swap usage, even with plenty of RAM free & swappiness set to 30:

# cat /proc/sys/vm/swappiness
30

# free -m
              total        used        free      shared  buff/cache   available
Mem:          40074        8277        6352         548       25444       29412
Swap:         24575        5814       18761

Top shows similar memory usage and an idea of idle load:

# top
    Tasks: 529 total,   6 running, 520 sleeping,   0 stopped,   3 zombie
%Cpu(s): 51.1 us, 11.8 sy,  0.0 ni, 35.2 id,  1.6 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem : 41035828 total,  6511476 free,  8697564 used, 25826788 buff/cache
KiB Swap: 25165820 total, 19224744 free,  5941076 used. 29916408 avail Mem

The reason I think there may be an issue is due to vmstat showing high swap usage

    # vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free    buff    cache   si    so   bi    bo   in   cs   us sy id wa st
10  0 5856556 4559436 724880 26904888 3072   20 26392  1624 30378 10346 30 22 40  4  5
12  1 5856444 4646072 725168 26902376 5344   80 19344  5548 25937 8049 18 30 45  4  4
17  0 5856608 4668964 717648 26875160 4772  164 17316  3300 31929 7187 19 53 19  5  5
10  1 5856780 4752916 704900 26836828 3768  172 17748   904 30300 7699 22 59 15  2  1
 2  3 5856788 4623292 708248 26859896 4768    8 34496  1972 30090 15300 25 19 51  4  1
11  0 5856792 4726972 709364 26876620 6732    4 39812  3796 26562 12241 20 27 50  3  0
12  1 5856832 4749100 710604 26842904 9000   40 19136  1128 26507 9283 25 41 32  2  0
 4  0 5856504 4859128 712020 26826680 7516  172 23524  2580 24394 10530 19 37 41  3  0
11  1 5856372 4791056 709808 26828772 11136    8 23860  2264 19609 12320 13 22 61  4  0
 7  0 5856372 4634432 713320 26852572 8744    0 37992  1028 23425 12326 18 21 56  4  0
 8  1 5856368 4530268 715848 26854832 5592    0 26272  2996 34327 14391 24 23 49  3  1
 7  0 5856376 4611928 714176 26902960 7044    8 45588  2984 32832 10172 26 25 45  3  0
11  0 5856408 4645736 712716 26914004 8164   32 19692  7160 23702 16326 21 21 54  2  1
 8  0 5856412 4658252 715648 26907004 6608    8 28260  1608 31679 10715 23 19 54  3  1
 7  0 5855264 4495904 718876 26929968 6288    0 31232   968 23048 9080 24 11 59  5  0
 4  0 5853560 4796052 719268 26938096 2148    0 18764  1556 21659 8297 15 14 69  2  0
 4  2 5852116 4692844 720656 26952300 2212    0 16764   636 18181 10162 18  9 71  2  0
 9  0 5850960 4566112 720756 26977948 3120    0 33116  5136 24373 7915 23 12 64  1  0
 9  0 5850316 4626320 721260 26962468 2400  148 13988   484 24208 6748 21 34 43  1  0

We have another server that is probably the closest to it in terms of load and number of users, they both have identical hardware and software, and the other server even has more users and generally higher CPU usage, though only barely. It consistently uses significantly less swap, it's free -m & vmstat looks like:

other# free -m
              total        used        free      shared  buff/cache   available
Mem:          40073        9471        4376        2408       26226       27210
Swap:         28671        3490       25181


other# vmstat 1
10  0 3564112 4904700 1227052 25172336    0   44 19672   896 23057 3955 23 16 61  0  0
 3  0 3564148 4837896 1226772 25168380    0   36  3080 15000 29370 4467 19 15 65  1  0
 9  0 3564204 4829296 1226612 25222808    0   56 46632  2100 16596 5268 11 15 73  1  0
 5  0 3564248 4663528 1226424 25328292    0   44 95104  1072 34862 6531 25 15 59  1  0
 6  1 3564296 4566072 1226924 25390092    0   48 55632  2056 35307 12050 15 16 68  2  0
 4  0 3564332 4496940 1226620 25391812    0   36  5772  1364 37957 4805 18 15 67  1  0
 5  0 3564348 4651396 1226552 25388820    0   16  2160  2004 14545 3504 12 11 76  0  0
 4  0 3564368 4809568 1225696 25331324    0   20  1524   700 11572 2997  7 10 83  0  0
 3  0 3564404 4802060 1223488 25328372    0   44  2160  2328 21654 11113 14 14 72  0  0
 3  0 3563960 4831720 1220844 25285840  296   16  1304  2120 16522 4083 11 11 77  0  0
 5  0 3563972 4805272 1217784 25289068    0   12  6920  1060 22775 4271 17 12 71  0  0
 3  0 3563980 4758120 1214556 25293184    0    8  5560  7896 30875 5421 22 13 63  0  1
 4  0 3563988 4687848 1211544 25293100   28    8  3288   828 24343 4136 16 12 71  0  0
 7  0 3564040 4474796 1204748 25299556    0   52  7908  1608 37724 6343 23 17 58  1  1
 3  0 3564072 4507772 1201608 25324032    0   32 26724  2536 33578 7815 22 16 61  1  0
 6  0 3564076 4709044 1198596 25327624    8    8  7016   928 29711 5099 19 13 67  1  0

When viewing atop it shows systemd using approximately 968GB of disk read a day, where the other server is using about 520GB of disk read a day. I am attributing this to swap, but would swap show up under that stat?

I know this server is under performing compared to the others, plus the lock ups seem to be related though hard to tell as when it locks up we can't get in and our logging fails too.
What I was hoping for is a way to find why this server is using swap so much.

EDIT: Added an image from Atop to show Page Scan/stall & Swap in Swap out Atop showing Page scan

You really have a lot of context switch – bgtvfr May 16 '18 at 08:50 — bgtvfr, May 16 '18 at 08:50

score 1 · Answer 1 · answered Apr 10 '18 at 02:41

I saw something similar on a physical server where developers were using the internal drive for both OS & data. From what I remember, we had an intermittently overworked root disk (where swap lived) + a normally full buffer cache, then something comes in and needs to allocate some memory.

It seems like either swap was need to free up buffer cache pages, or else the files, which were on the same device needed to be touched ... since it couldn't get rid of the memory pages quick enough, I think it started swapping things out causing a vicious cycle... I don't have any numbers to back this up, but in my experience the drop-off when you exceed provisioned IOPS is much more abrupt than just overworking a locally attached spinning disk ... look very closely at the IO on the device where swap is.

Hey fjc101, I've added an atop image, but the hard drives are raid10 SAS SSDs and the IOPS on the hard drives aren't going above 5% in iotop, so I don't think this is our issue. — Kyle Vail, Apr 11 '18 at 01:48

score 0 · Accepted Answer · answered Apr 10 '18 at 03:18

First, you're almost certainly not swapping here - you're paging. Paging is a different process that uses the same storage as swap but is proactive rather than purely reactive to memory conditions.

If you're seeing more paging, it's going to be because of your workload against your set of hardware. Monitoring your workload operations per second (or a similar metric) will reveal any inconsistencies there. Depending on how your load balancing works, you could have a "sticky" client that demands a high load on this particular server - but that's just conjecture.

Secondly, if these servers all have the same hardware profile you should expect them to run almost identically under the same load. If your load is really balanced, then running some hardware diagnostics would be prudent. Focus first on disk and network I/O, as a barely failing disk or NIC will shove the load average and associated memory usage way up as the hardware occupies its time on retries, etc.

Despite your data shown, this is really quite broad so it's hard to answer precisely. Posting the results of workload and hardware diagnostic information would help to provide a more concrete answer, if not answer the question themselves.

Hi SmallLoanOf1M Thanks for replying, of course now that I am watching the server it isn't performing quite as badly. I have uploaded an image of Atop as it shows currently, and understand what you are saying about paging, at least I thought I did. It may help to know these servers are high speed SSDs and while watching them today they haven't passed 5% IO via iotop. Network load is also similar to the other control server I was referencing. Do you know why it would be swapping page given how much free memory is available and swappiness is set so low? — Kyle Vail, Apr 11 '18 at 00:35
Swappiness doesn't affect paging. Your workload is primarily responsible for paging, and is supposed to responsibly page out unused memory to avoid fragmentation and other inefficient allocations. This behavior is highly normal. — Spooler, Apr 11 '18 at 05:24
I guess that makes sense, while I was looking for an answer as to reducing this as I assumed it was the cause of the load, there must be another issue. Thanks for your help and I'll mark your's as the answer, basically the behavior is normal and not the cause of the load. — Kyle Vail, Apr 12 '18 at 22:10

High swap usage on CentOS server with plenty of spare RAM

2 Answers2