I am trying to use cgroups to limit memory usage of user processes on servers with a large amount of ram (128 GB or more). What we want to achieve is to reserve about 6GB of ram for OS and root processes and leave the rest to users. We want to make sure we have free memory at all times and we don't want servers to swap aggressively.
This works fine if limit is set low enough ( < 16GB ). User processes are correctly assigned to the right cgroup by cgred and once the limit is reached, oom will terminate memory hungry processes.
The issue rise when we set the limit higher. Then, the server will start swapping if a process is using more than 16G of ram even if memory usage is still well below the limit and there is plenty of ram is available.
Is there any setting or some sort of maximum that would limit the amount of memory we can grant access to under cgroups?
Here is more info:
I use the following code to simulate user processes eating memory. The code keep track of allocated memory in a linked list so the memory is used and accessible from within the program, as opposed to just being reserved with malloc (and overwrite pointer each time).
/* Content of grabram.c */
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
struct testlink {
void *ram;
struct testlink *next;
};
int main (int argc, char *argv[]) {
int block=8192;
char buf[block];
void *ram=NULL;
FILE *frandom;
int nbproc,i;
pid_t pID;
struct testlink *pstart, *pcurr, *pnew;
if (argc < 2) {
//nbproc = 1 by default
nbproc=1;
} else {
if (sscanf(argv[1], "%d", &nbproc) != 1) {
/* it is an error */
printf("Failed to set number of child processes\n");
return -1;
}
}
// open /dev/urandom for reading
frandom = fopen("/dev/urandom", "r");
if ( frandom == NULL ) {
printf("I can't open /dev/urandom, giving up\n");
return -1;
}
fread(&buf, block, 1, frandom);
if ( ferror(frandom) ) {
// we read less than 1 byte, get out of the loop
printf("Error reading from urandom\n");
return -1;
}
fclose (frandom);
// pID=0 => child pID <0 => error, pID > 0 => parent
for (i=1; i<nbproc; i++){
pID = fork();
// break out of the loop if a child
if (pID == 0)
break;
// exit if fork fails
if (pID < 0) {
printf("fork() failed, dying \n");
return -1;
}
}
pstart = (struct testlink*)malloc(sizeof(struct testlink));
pstart->ram=NULL;
pstart->next=NULL;
pcurr = pstart;
while ( 1==1 ) {
ram = (void *)malloc(block);
if (ram == NULL) {
printf("can't allocate memory\n");
return -1;
}
memcpy(ram, &buf, block);
// store allocated blocks of ram in a linked list
// so no one think we are not using them
pcurr->ram = ram;
pnew = (struct testlink*)malloc(sizeof(struct testlink));
pnew->ram=NULL;
pnew->next=NULL;
pcurr->next=pnew;
pcurr=pnew;
}
return 0;
}
So far i tried setting the following tuneables:
vm.overcommit_memory
vm.overcommit_ratio
vm.swappiness
vm.dirty_ratio
vm.dirty_background_ratio
vm.vfs_cache_pressure
None of these sysctl settings seemed to have any effect. The server will start swapping after my code above go above the 16GB barrier even if swappiness is set to 0, overcommit is disabled, etc. I even tried to turn swap off to no avail. Even with no swap, kswapd is still triggered and performance decreases.
Finally, the relevant content of cgconfig.conf file
mount {
cpuset = /cgroup/computenodes;
cpu = /cgroup/computenodes;
memory = /cgroup/computenodes;
}
#limit = 120G
group computenodes {
# set memory.memsw the same so users can't use swap
memory {
memory.limit_in_bytes = 120G;
memory.memsw.limit_in_bytes = 120G;
memory.swappiness = 0;
# memory.use_hierarchy = 1;
}
# No alternate memory nodes if the system is not NUMA
# On computenodes use all available cores
cpuset {
cpuset.mems="0";
cpuset.cpus="0-47";
}
}
Finally, we use Centos 6, kernel 2.6.32.
Thanks
Yes, with one cgroup and one instance of the process, i start swapping at 16GB despite more than 100G free. Takes less than a minute to get there. – Marc-andré Labonté Feb 24 '14 at 20:51