Optimal Swappiness in Parallel R Processes

Question

I understand that optimal swappiness depends on the application. Database servers often have a swappiness close to zero, while in many other cases it is apparently recommended to keep the default value of 60. However, I am not sure which setting suits my scenario.

The server runs Ubuntu 20.04, has 8 CPUs, around 30 GB RAM, and an SSD drive. I use it to execute parallelized (SOCK cluster) R scripts usually involving geo-spatial data. While running the R script, I do not execute any other application on the machine. And I read the data directly from a file on disk, not from any SQL or other database system. With the current task, the process exceeds the available RAM every few minutes for a few seconds and otherwise remains well below that limit. There are likely some inefficiencies in the package I am using. However, I will neither adjust the package code, nor will I install further RAM. What I would like to do is to set a swappiness value that best suits my application.

A rule of thumb regarding such data science applications would be nice.

The swappiness value doesn't even matter in this scenario. It could be anything and you will still use the swap. — Michael Hampton, Jan 28 '21 at 14:47
The machine certainly uses the swap when the RAM is full. However, swap is also being used when the RAM is not yet full. And that has efficiency implications. — Chr, Jan 29 '21 at 10:44

Optimal Swappiness in Parallel R Processes

0 Answers0