Whether a resource is consumable or not, and how much can be reserved on a system, is configurable. You can use one of the existing values or you can create a new one, up to you.
While there's no harm in setting it anyway, mem_free
is not consumable by default. That means that while there must be that amount of memory available on the system when your job starts, if 10 jobs each requiring 10GB of free memory can all start at the same time on a server with 11GB of free memory. If all of them actually use 10GB you'll be in trouble.
The differences between the others come down to enforcement. rss (physical memory usage) isn't enforced. vmem (virtual memory usage) is. Unfortunately linux doesn't offer good ways to enforce physical memory usage (cgroups are ok, but the rss ulimit doesn't actually do anything in modern kernels).
On the other hand, it's very important to recognize that there is NO correct way to treat vmem as a consumable resource. If you compile "hello world" in C with the -fsanitize=address
debugging option (available in clang or gcc5+), it'll use 20TB of virtual memory, but less than 5MB of physical memory. Garbage collected runtimes like Java and Go will also allocate significant quantities of vmem that never get reflected as physical memory, in order to reduce memory fragmentation. Every chrome tab on my 8GB laptop uses 2TB of virtual memory as part of its security sandboxing. These are all totally reasonable thing for programs to do, and setting a lower limit prevents perfectly well-behaved programs from working. Just as obviously, setting a consumable limit of 20TB of vmem on a system is pointless.
If you must use h_vmem for whatever reason, the difference between the h_
and s_
variants are which signal is used to kill processes which exceed the limit - h_
kills processes with SIGKILL (e.g. kill -9
), whereas s_
uses a signal which a process can handle (allowing a well-behaved job to shut down cleanly, or a poorly-behaved one to ignore the signal). Best advice there is to first cry because vmem restrictions are inherently broken, and then set h_vmem to slightly higher than s_vmem so jobs have the opportunity to die with a useful error message.
My advice would be to have the cluster admin configure h_rss to be consumable, set both h_rss and mem_free in your job template, avoid h_vmem altogether, and hope that people don't abuse the system by under-reserving memory. If an enforcement mechanism is required, it's complicated but one can set up the job manager to put jobs in memory cgroups and set either memory.limit_in_bytes or memory.soft_limit_in_bytes. The latter allows a cgroup to exceed its reservation so long as the system isn't running out of memory. This improves the kernel's ability to cache files on behalf of those processes, improving performance for everyone, but there is a risk in that when the system does run out of memory there are circumstances under which the OOM killer doesn't have time to look around for a process to kill from an over-limit cgroup, and instead the attempted allocation will fail.