Very slow server performance at low loads

I use a server at work to run models on modestly large datasets in RAM (think 10GB to 100GB). There are only a few people on this server at any time. The server has lots of RAM (over 1TB) and many processors. We have found that when RAM usage exceeds what seems to be a modest threshold - think three people loading a combined 100GB of data into RAM in applications like R or Stata - then the server becomes dramatically slower. Operations that would take seconds on my PC at home take hours or days on the server. I am not sure why this would be the case: it is as if the server doesn't want to free cached memory, and even operating on the data currently held in memory takes an insanely long amount of time (CPU load is low: <10%). Even stuff on the command line takes a while: listing files can take a few seconds, etc. I don't have permissions to edit things on the server myself. Does anyone know what could be going on here, or things I can hunt for without having root access? The system administrators don't know what is going on.

We are running Red Hat Enterprise 6.9.

Thanks so much for your help!

framsey

Posted 2019-02-16T02:10:45.563

Reputation: 1

This feels like it's going to come down to the server not really using all of its RAM and ending up using virtual memory and swapping to disk too much (thrashing). – Spiff – 2019-02-16T02:27:32.143

Do you know how I can check this out, or what settings the system administrators should be tuning? If it helps, there is no swap space as far as I can tell (from top - swap's always at 0). That's probably a bad idea for a few reasons, but we're not getting processes killed because we have way more than enough RAM (except when we hit about 10-15% utilization, things slow down enormously). – framsey – 2019-02-16T05:03:13.363

Answers

Here's a few ideas of things to check, some probably require sudo but some hopefully don't (like just cat).

Check if there's any swap with
- cat /proc/swaps
- swapon -s
- swapon --show
Check "swappiness" with cat /proc/sys/vm/swappiness
Try monitoring temperatures or CPU frequencies, maybe something's overheating and/or throttling down.

If ram were overheating, I'd expect a bunch of random errors, unless it's only 1 out of 50 sticks, and I think RAM rarely has temp monitors...
Anything in dmesg or /var/log/syslog?

Clear/flush the disk caches with

sync; echo 3 | sudo tee /proc/sys/vm/drop_caches

Maybe there's a ton of disk or network activity, check with a program like iftop or iotop

Xen2050

Posted 2019-02-16T02:10:45.563

Reputation: 12 097

Asked: 2019-02-16T02:10:45.563

Viewed: 170 times

Active: 2019-02-16T05:18:41.043