Limit private memory usage per user

Question

I'm using cgroups with the memory controller to set a memory limit for each user (using the memory.limit_in_bytes setting).

The problem is that this setting also accounts cache usage. Therefore, if the limit is 1GB, and the user merely downloads or copies a 1GB file, their processes get killed. What's worse, the cached pages remain in memory, so the user's "memory usage" remains close to 1GB even when they have zero processes running.

Naturally, this makes no sense. I only want to limit total private (non-anonymous) memory usage per user. How can I achieve that?

Alternatively, get the OOM killer to try dropping the user's cached pages before going off killing processes, which doesn't even free the cached pages.

Please don't do this. It will just make performance worse for everyone by increasing the amount of I/O needed. If the memory is available, and you prohibit people from using it, you make performance worse for no gain. What is your actual requirement? (As opposed to what tool you think you need to meet it.) There's probably a good way to solve it. — David Schwartz, Mar 15 '13 at 07:27
"What's worse, the cached pages remain in memory, so the user's "memory usage" remains close to 1GB" No! That's good. If someone else happens to read that same file, no I/O will be needed. If nobody does, the operating system will use the memory for something else anyway. Having data that might be useful is way better than having nothing! — David Schwartz, Mar 15 '13 at 07:29
My requirement is preventing one user's runaway process(es) from crashing the entire system. — Vladimir Panteleev, Mar 15 '13 at 07:49
I know cached memory is good. What's not good is that it's counted towards the user's total. But if it gets to that, I'll take clearing caches over not even allowing that user to log back in. — Vladimir Panteleev, Mar 15 '13 at 07:50
Ahh, yeah, then your approach is totally wrong. It doesn't matter to Jack whether Joe is destroying his performance by consuming all memory with private data or with cache. He makes the system unusably slow the same either way. (The system handles memory read into a private mapping precisely the same as memory read in from a file, assuming nobody else is reading that same file. What reason would it have for any difference?) — David Schwartz, Mar 15 '13 at 08:03
Huh?? That makes no sense. One process/thread can only be reading from one spot on the disk at the same time - the rest is "just in case" cache that can be thrown out at any moment. — Vladimir Panteleev, Mar 15 '13 at 08:05
A memory-mapped file could be effectively counted towards the user's total, as some program has explicitly requested a mapping of that size. This is completely different from simply copying or downloading a file, where the file is read/written in very small pieces at a time. I don't have a requirement about this case, though, since it's unlikely a runaway program will create / map huge files. — Vladimir Panteleev, Mar 15 '13 at 08:09
Furthermore, memory backed by disk can be swapped out any time. I'm mainly interested in counting memory that is NOT backed by disk, since you can't move it anywhere else (the server has plenty of RAM but no swap). — Vladimir Panteleev, Mar 15 '13 at 08:11
A runaway program can cause the system's disk cache to thrash, causing the same performance degradation that excessive private pages can. Performance can be destroyed just as effectively by usage of clean pages as it can by dirty pages. — David Schwartz, Mar 15 '13 at 08:44
David, you're just repeating what you've said above without either backing up your arguments, or countering mine. — Vladimir Panteleev, Mar 15 '13 at 08:45
I'm trying to find a way so that you understand my point. If a user consumes clean pages (cache), that will squeeze the cache for code pages of other processes to the point that I/O will go through the roof. That will make the system just as useless as if it used excessive private memory because soon most pages of code for other users' processes will have dropped out of the system and each time a process gets to run, it will hit a major page fault in its code page. You're approaching the problem wrong. Your question is based on a misunderstanding of how memory usage impacts system performance. — David Schwartz, Mar 15 '13 at 08:48
Well, your argument is based on the assumption that the code pages will be discarded with the same priority as cache pages. I think it's highly unlikely that a code page will end up lower on the LRU than the first page of a 24GB file the user is reading. You seem to be quite convinced, do you have any references? Also, since you think this is the wrong approach, what's a better one? — Vladimir Panteleev, Mar 15 '13 at 08:54
If the user is reading the file at a high speed, assuming the data and code are on the same type of storage medium (comparable speed), he'll push an awful lot of code out of cache. He'll also push other users' data out of cache. This will result in more I/O making the system slower for everyone. As for a better one, the one you discuss in the question that you think makes no sense, for precisely the reason I'm trying to explain. You can't treat cache as free/cheap because if a user consumes too much cache, he'll force everyone else's pages out. — David Schwartz, Mar 15 '13 at 09:01
"Alternatively, get the OOM killer to try dropping the user's cached pages before going off killing processes, which doesn't even free the cached pages." Freeing the cached pages doesn't help. The damage was already done by the pages that had to be discarded to read them in. Freeing them is absurdly cheap and the system will do that shortly as soon as the pages these pages evicted fault back in. You need to "punish" the user who squeezed everyone else's working set, otherwise he can keep doing it over and over. — David Schwartz, Mar 15 '13 at 09:03
Killing a user's processes and not even letting them log back in, because they've touched too much data on the disk, is insanely broken. I can't fathom what *anyone* who thinks this is a valid approach must be thinking. — Vladimir Panteleev, Mar 15 '13 at 09:04
Then don't use limits. That's what limits do -- they prevent users from doing things that exceed those limits. You have to set the limit high enough that no reasonable use will exceed it. (An approach that lets a user squeeze out everyone else's clean pages and then repeat that over and over forever won't solve the problem. Having your clean pages pushed out of cache causes a *huge* performance penalty.) — David Schwartz, Mar 15 '13 at 09:05
Another thing that will help a lot is adding some swap. That way, the system will at least have the option of evicting dirty pages. That will make it much more resistant to having too many clean pages evicted. Without swap, it has no choice. (Don't worry about swap being "slow". Giving a smart operating system an extra option to use only if it makes sense won't make it slower. It will only use it if it makes things better.) — David Schwartz, Mar 15 '13 at 09:09
OK, that's insightful... but still I find this answer completely unsatisfactory. A model closer to my goal is full-fledged virtualization (or even drawing the parallel with multiple isolated machines) - here, a user's cache only contends with the user's private memory, which is pretty much what I want to achieve. Also, dynamically throttling the user's I/O would be a much saner way to preserve system responsiveness if their cache contends with other users'. — Vladimir Panteleev, Mar 15 '13 at 09:14
Also, about swap... I'd rather let loose oom_killer than let anything go to swap, or let e.g. sshd to get paged out. My experience shows that it's really hard to do anything with a thrashing server... Maybe I should look into a way of triggering oom_killer when RAM runs low (and not when it runs out). — Vladimir Panteleev, Mar 15 '13 at 09:27
On second thought (and taking in what you've said above), it doesn't sound like that would work if the cause of the thrashing is disk caches/buffers. — Vladimir Panteleev, Mar 15 '13 at 09:36
let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/7924/discussion-between-cybershadow-and-david-schwartz) — Vladimir Panteleev, Mar 15 '13 at 09:36

Vladimir Panteleev · Accepted Answer · 2013-03-15T09:24:11.840

Posting what I think might be a better answer.

My requirement is preventing one user's runaway process(es) from crashing the entire system.

Linux already has a feature for doing exactly this: the OOM killer.

The OOM killer runs when the system runs out of memory, and favors processes that consume a lot of RAM quickly. It is also less likely to kill long-running / system (superuser) processes.

The OOM killer can be further tuned by tweaking the /proc/<pid>/oom_score_adj file. The setting is inherited by child processes, so you only need to set it on each user's root process. (See Documentation/filesystems/proc.txt, section 3.1)

Limit private memory usage per user

1 Answers1

Linked