0

I'm troubleshooting some issues with my RHEL 5 server. This is an Oracle DB server which are running for a while now without much issue. Lately I notice that the server load is relatively high due to KSWAPD processes causing high CPU usage. Upon checking i notice the server is having a lot of swapping activity.

The server specs are:

12 x 2 CPU & 64GB RAM
bash-3.2$ uname -a
Linux 2.6.18-408.el5 #1 SMP Fri Dec 11 14:03:08 EST 2015 x86_64 x86_64 x86_64 GNU/Linux

When I view top, I can see the server still has 10GB of free physical memory left, thus I'm not sure why it's swapping. Appreciate if someone could point me the correct direction to troubleshoot.

top - 15:31:35 up 231 days,  5:22,  2 users,  load average: 13.27, 13.97, 14.12
Tasks: 1443 total,  12 running, 1431 sleeping,   0 stopped,   0 zombie
Cpu(s): 29.2%us, 17.2%sy,  0.0%ni, 47.5%id,  5.4%wa,  0.0%hi,  0.6%si,  0.0%st
Mem:  65839252k total, 53587688k used, 12251564k free,   122936k buffers
Swap: 68059128k total,  4535508k used, 63523620k free, 45719164k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9423 oraitxnp  17   0 8403m 167m 166m R 98.7  0.3   0:57.51 oracle
12348 oraitxnp  17   0 8405m 242m 240m R 98.7  0.4   0:39.11 oracle
 8942 oraitxnp  20   0 8404m 174m 171m R 95.6  0.3   1:59.77 oracle
 9049 oraitxnp  25   0 8404m 170m 167m R 95.6  0.3   1:33.17 oracle
 9402 oraitxnp  25   0 8404m 161m 158m R 95.6  0.3   1:24.03 oracle
13280 oraitxnp  17   0 8403m 161m 159m R 95.6  0.3   1:04.59 oracle
13227 oraitxnp  17   0 8403m 165m 162m R 92.4  0.3   0:40.65 oracle
 1431 root      11  -5     0    0    0 R 82.8  0.0   2802:41 kswapd2
11395 oraitxnp  16   0 8403m 192m 191m R 66.9  0.3   0:15.55 oracle

sar -r 
02:20:02 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
02:30:11 PM  12860252  52979000     80.47    122888  45721248  63711928   4347200      6.39    853652
02:40:02 PM  12591216  53248036     80.88    122876  45728156  63467408   4591720      6.75    860892
02:50:01 PM  12648836  53190416     80.79    122928  45729408  63717800   4341328      6.38    913284
03:00:02 PM  12489840  53349412     81.03    122932  45727364  63558884   4500244      6.61    941220
03:10:05 PM  12380352  53458900     81.20    123064  45735548  63541648   4517480      6.64    879124
03:20:12 PM  12195596  53643656     81.48    123124  45732364  63358440   4700688      6.91    901656
03:30:02 PM  12425600  53413652     81.13    122936  45718624  63582308   4476820      6.58    964544
Average:     12406342  53432910     81.16    121691  45498460  63646323   4412805      6.48    952204

sar -B
02:20:02 PM  pgpgin/s pgpgout/s   fault/s  majflt/s
02:30:11 PM  36386.86   4421.45  14369.55   2242.21
02:40:02 PM  41398.13   5570.15  17610.94   2555.90
02:50:01 PM  51600.70   4681.47  14093.22   1675.94
03:00:02 PM  48850.39   5340.96  15636.23   2251.99
03:10:05 PM  53043.46   4755.90  17506.83   2378.80
03:20:12 PM  39151.42   5297.79  14383.58   1816.64
03:30:02 PM  47760.58   5099.56  14774.31   2236.45
Average:     47687.94   4831.93  15128.85   2191.29

-bash-3.2$ free -m
             total       used       free     shared    buffers     cached
Mem:         64296      52281      12014          0        120      44655
-/+ buffers/cache:       7506      56789
Swap:        66463       4545      61918
Dennis Nolte
  • 2,848
  • 4
  • 26
  • 36
Jason Oon
  • 1
  • 1
  • 2
  • Why are you running a 12 year old kernel? – kasperd Oct 22 '18 at 07:59
  • Hi Kasperd, we're planning for a hardware refresh. But this decision is still pending from the management approval for quite a while now. That's why I'm still running on old kernels. – Jason Oon Oct 22 '18 at 09:12
  • @JasonOon you need to check majflt in your sar output, your server is swapping. – c4f4t0r Oct 23 '18 at 16:04
  • Hi, yes I know it's swapping, thats why kswapd kicks in. But I don't know why it's swapping so much since there's about 10GB of free memory left. – Jason Oon Oct 24 '18 at 08:44

1 Answers1

0

What is your vm.swappiness set at? Default is 60 (on Ubuntu anyway). As I understand, the lower the number, the more your system will prefer RAM over swap.

This is, of course, assuming the high CPU load is due to disk swap. If I'm reading that output correctly, those 8 oraitxnp process are consuing 8G+ of virtual (RAM) each. That seems like physical RAM contention but not sure how the RES and SHR columns work into that.

I would cat /proc/meminfo to get a better idea of how much "physical" RAM is being used. It's hard to tell from some of the sar output due to the way it mashes the 64G physical + 66G swap together, but I would venture a guess that adding another 64G of RAM to that box -- and maybe reduce that disk swap down to 8G or something. Ideally, you never want to hit disk swap. If you do, you need to add more physical RAM or incur performance penalties.

Years ago, the Linux standard for swap was to "just make it double your RAM" but this was when most desktop systems were only running 1-2G. Even Redhat has changed this tune, suggesting 20% of physical is "usually a good idea"

Server Fault
  • 3,454
  • 7
  • 48
  • 88
  • Hi, the meminfo details as below, I've set the swappiness to 10 only `MemTotal: 65839252 kB MemFree: 11055584 kB Buffers: 132300 kB Cached: 46230948 kB SwapCached: 944976 kB Active: 46246740 kB Inactive: 2651236 kB LowTotal: 65839252 kB LowFree: 11055584 kB SwapTotal: 68059128 kB SwapFree: 64631536 kB Dirty: 324 kB Writeback: 44 kB AnonPages: 2407504 kB Mapped: 4737496 kB Slab: 640680 kB PageTables: 4732512 kB CommitLimit: 100978752 kB Committed_AS: 27545760 kB` – Jason Oon Oct 24 '18 at 08:40
  • 70% (46G) of your RAM is going to "cache". I believe this is RAM that is otherwise unused, so the kernel uses it for *something*. When a process needs more RAM, cached gets cleared to make room. Supposedly, this doesn't affect performance, and cache never hits disk swap. There's some discussion about it here: https://askubuntu.com/questions/198549/what-is-cached-in-the-top-command. Seems to me that the system is Ok with 64G, at least at the time of the above output. Maybe monitor cache and watch for it emptying, eventually using disk swap for more RAM (backup, nightly import, etc) – Server Fault Oct 25 '18 at 15:58
  • After monitoring for quite sometime now, I notice that the server is still swapping for no apparent reason. I have another identical server running the same oracle DB server with the same resources, that server's isn't swapping that much and fully utilizing it's physical memory. There'll be a downtime for me to restart that server, hopefully this would resolve the abnormality. – Jason Oon Dec 27 '18 at 06:16
  • Hrm...Does `sysctl -a` report the same thing on both systems? I recall Oracle requiring a lot of kernel tuning in the past (haven't worked with it in over 10 years). Maybe one has been tuned, the other not. – Server Fault Dec 27 '18 at 14:49
  • HI Server Fault, apparently the restart resolve the issue. I no longer see any kswapd process running & the system CPU utilization normalized. I'm not sure if this is a hidden bug with this version of RedHat Linux or not since the version is no longer supported by Red Hat (RHEL 5.7). – Jason Oon Jan 22 '19 at 06:25