0

I'm often experiencing slowness on a large server that is used by several users simultaneously, with many CPUs (72) and a decent amount of ram (125gb).

Granted, the server runs a lot of stuff so gets a high load, but it pretty much never happens that close to all CPUs are in use at the same time. Ram usage often goes up to about 80%. What I noticed is that iowait often becomes high (most of the time ~15%, I've seen ~20% tops), and also swap seems to be pretty much always exactly 100% utilized. Swap is 4gb.

In order to address the slowness, what would you recommend?

These are some options I've thought about:

  1. Increase swap

  2. Decrease swappiness to e.g. 10 (it's currently set at 60), so that at least whenever sufficient RAM is available swap isn't constantly full

It would be great to get your thoughts on these options.

I'm far from an expert in this domain, so any other suggestions are also very welcome.

Edit: adding details based on discussion in comments. See below:

vmstat shows me a very high number for swap in, and using glances, I see a warning (ongoing) MEMSWAP (100.0)

dreamer
  • 101
  • 2

2 Answers2

1

First, is I would get something like Performance Co-Pilot installed so you can understand utilization trends on the server. Also, I would just sit top for a while to get a picture of which processes are the most active and how much memory they use.

Second, it would seem certainly reasonable to increase swap or even eliminate swap. 4gb is almost worthless for a server that large. People turn off swap permanently so that under memory pressure the VM cannot start juggling pages and using excess CPU for that purpose. Of course, there is still plenty of unneeded dirty pages and buffer cache that the VM can start evicting doing when memory is low, but those will go to the filesystem. If the kernel isn't constantly killing processes with out of memory, then perhaps the memory utilization is largely filesystem cache.

To sum up, delete the swap device, add pcp, try to get a handle on how memory is used, and if you add back swap, make sure it is of a right size. ideally you would want dormant processes in swapped, and all active processes would be able to fit in the available RAM. If active processes are all swapping pages in and out for extended periods of time, then more swap is just going to enable the server to operate beyond it's capabilities.

toppk
  • 196
  • 4
  • Thanks, this is useful advice. I already have a profiler running to monitor memory usage, but it's hard to see trends as there are lots of users running as hoc processes on this machine. However, one thing that is clear is that consistently swap gets full (100%) while there is still at least 20/30gb of RAM left, often more. Do you think getting rid of swap is better than increasing it in this case? And I guess if I'm getting rid of swap I can also just do this by decreasing swappiness to something like 10, right? – dreamer Sep 14 '22 at 16:29
  • swap may be full, but those are probably memory pages belonging to dormant processes. there is no reason to bring it back into memory unless there is demand for those pages. if you see lots of activity on swap (pages in and out) and there is lots of memory free, then that would be something more concerning. – toppk Sep 14 '22 at 16:33
  • `vmstat` shows me a very high number for swap in, and using `glances`, I see a warning `(ongoing) MEMSWAP (100.0)`. So it seems that it is constantly swapping in new data. So I'm just wondering, do you think in that case I should increase my swap, or do the opposite, and just lower the value for swappiness? – dreamer Sep 14 '22 at 16:47
  • i would delete swap first. if you have plenty of free ram, then just cease all swapping entirely seems like the easier fix. just check the logs to make sure there hasn't been silient oom killers invoked. – toppk Sep 14 '22 at 17:37
  • Thanks @toppk! I will try that. Appreciate your help! – dreamer Sep 14 '22 at 18:14
0

Dont delete swap. You are running out of virtual memory. Increase swap size up to 1-2 x RAM

Compare

# The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been “used” by them as of yet. 
cat /proc/meminfo | grep Committed_AS

with

# This is the total amount of memory currently available to be allocated on the system, expressed in kilobytes. 
cat /proc/meminfo | grep CommitLimit
gapsf
  • 641
  • 1
  • 5
  • 12