We've all experienced it--some program is asked to do something that requires a huge amount of memory. It dutifully tries to allocate all this memory, and the system immediately begins thrashing, swapping endlessly and becoming sluggish or non-responsive.

I most recently experienced this on my Ubuntu laptop due to a Matlab script trying to allocate a ridiculously huge matrix. After ~5+ minutes of thrashing, I was able to Ctrl-F1 to a console and kill Matlab. I would much rather have some hot-key that would have given me control of the system immediately and allowed me to kill the offending process; or, perhaps, simply silently refuse to allocate such a large buffer.

What is the quickest way to regain control of a Linux system that has become nonresponsive or extremely sluggish due to excessive swapping?
Is there an effective way to prevent such swapping from occurring in the first place, for instance by limiting the amount of memory a process is allowed to try to allocate?

nibot

Posted 2011-02-22T01:02:49.820

Reputation: 1 522

Answers

Press Alt-SysRq-F to kill the process using the most memory:

The SysRq key is usually mapped to the Print key.
If you're using a graphical desktop you might need to press Ctrl-Alt-SysRq-F in case pressing Alt-SysRq triggers another action (e.g. snapshot program).
If you're using a laptop you might need to press a function key too.
For more information read the wikipedia article.

joke

Posted 2011-02-22T01:02:49.820

Reputation: 236

I've made a script for this purpose - https://github.com/tobixen/thrash-protect

I've had this script running on production servers, workstations and laptops with good success. This script does not kill processes, but suspends them temporary - I've had several situations later where I'm quite sure I'd lost control due to thrashing if it wasn't for this simple script. In "worst" case the offending process will be slowed down a lot and in the end be killed by the kernel (OOM), in the "best" case the offending process will actually complete ... in any case, the server or workstation will remain relatively responsive so that it's easy to investigate the situation.

Of course, "buy more memory" or "don't use swap" are two alternative, more traditional answers on the question "how to avoid thrashing?", but in general they tend not to work out so well (installing more memory may be non-trivial, a rogue process can eat up all memory no matter how much one has installed, and one can get into thrashing-problems even without swap when there aren't enough memory for buffering/caching). I do recommend thrash-protect plus lots of swap space.

tobixen

Posted 2011-02-22T01:02:49.820

Reputation: 229

About disabling the swap, according to http://unix.stackexchange.com/a/24646/9108 it might not be the best option.

– sashoalm – 2016-09-05T19:01:48.183

Indeed, someone commented the same on me, so I've modified the thrash-protect doc at that point. – tobixen – 2016-09-09T06:43:02.090

What is the quickest way to regain control of a Linux system that has become nonresponsive or extremely sluggish due to excessive swapping?

Already answered above with Alt-SysRq-F

Is there an effective way to prevent such swapping from occurring in the first place, for instance by limiting the amount of memory a process is allowed to try to allocate?

I'm answering this 2nd part. Yes, ulimit still works well enough to limit a single process. You can:

set a soft limit for a process you know will likely go out of control
set a hard limit for all processes if you want extra insurance

Also, as briefly mentioned:

You can use CGroups to limit resource usage and prevent such problems

Indeed, cgroups offer more advanced control, but are currently more complicated to configure in my opinion.

Old school ulimit

Once off

Heres a simple example:

$ bash
$ ulimit -S -v $((1*2**20))
$ r2(){r2 $@$@;};r2 r2
bash: xmalloc: .././subst.c:3550: cannot allocate 134217729 bytes (946343936 bytes allocated)

It:

Sets a soft limit of 1GB overall memory use (ulimit assumes limit in kB unit)
Runs a recursive bash function call r2(){ r2 $@$@;};r2 r2 that will exponentially chew up CPU and RAM by infinitely doubling itself while requesting stack memory.

As you can see, it got stopped when trying to request more than 1GB.

Note, -v operates on virtual memory allocation (total, i.e. physical + swap).

Permanent protection

To limit virtual memory allocation, as is the equivalent of -v for limits.conf.

I do the following to protect against any single misbehaving process:

Set a hard address space limit for all processes.
address space limit = <physical memory> - 256MB.
Therefore, no single process with greedy memory use or an active loop and memory leak can consume ALL the physical memory.
256MB headroom is there for essential processing with ssh or a console.

One liner:

$ sudo bash -c "echo -e \"*\thard\tas\t$(($(grep -E 'MemTotal' /proc/meminfo | grep -oP '(?<=\s)\d+(?=\skB$)') - 256*2**10))\" > /etc/security/limits.d/mem.conf"

To validate, this results in the following (e.g. on 16GB system):

$ cat /etc/security/limits.d/mem.conf
*   hard    as      16135196
$ ulimit -H -v
161351960

Notes:

Only mitigates against a single process going overboard with memory use.
Won't prevent a multi-process workload with heavy memory pressure causing thrashing (cgroups is then the answer).
Don't use rss option in limits.conf. It's not respected by newer kernels.
It's conservative.
- In theory, a process can speculatively request lots of memory but only actively use a subset (smaller working set/resident memory use).
- The above hard limit will cause such processes to abort (even if they might have otherwise run fine given Linux allows the virtual memory address space to be overcommitted).

Newer CGroups

Offers more control, but currently more complex to use:

Improves on ulimit offering.
- memory.max_usage_in_bytes can account and limit physical memory separately.
- Whereas ulimit -m and/or rss in limits.conf was meant to offer similar functionality, but that doesn't work since kernel Linux 2.4.30!
Need to enable some kernel cgroup flags in bootloader: cgroup_enable=memory swapaccount=1.
- This didn't happen by default with Ubuntu 16.04.
- Probably due to some performance implications of extra accounting overhead.
cgroup/systemd stuff is relatively new and changing a fair bit, so the flux upstream implies Linux distro vendors haven't yet made it easy to use. Between 14.04LTS and 16.04LTS, the user space tooling to use cgroups has changed.
- cgm now seems to be the officially supported userspace tool.
- systemd unit files don't yet seem to have any pre-defined"vendor/distro" defaults to prioritise important services like ssh.

E.g. to check current settings:

$ echo $(($(cat /sys/fs/cgroup/memory/memory.max_usage_in_bytes) / 2**20)) MB
11389 MB
$ cat /sys/fs/cgroup/memory/memory.stat
...

E.g. to limit the memory of a single process:

$ cgm create memory mem_1G
$ cgm setvalue memory mem_1G memory.limit_in_bytes $((1*2**30))
$ cgm setvalue memory mem_1G memory.memsw.limit_in_bytes $((1*2**30))
$ bash
$ cgm movepid memory mem_1G $$
$ r2(){ r2 $@$@;};r2 r2
Killed

To see it in action chewing up RAM as a background process and then getting killed:

$ bash -c 'cgm movepid memory mem_1G $$; r2(){ r2 $@$@;};r2 r2' & while [ -e /proc/$! ]; do ps -p $! -o pcpu,pmem,rss h; sleep 1; done
[1] 3201
 0.0  0.0  2876
 102  0.2 44056
 103  0.5 85024
 103  1.0 166944
 ...
98.9  5.6 920552
99.1  4.3 718196
[1]+  Killed                  bash -c 'cgm movepid memory mem_1G $$; r2(){ r2 $@$@;};r2 r2'

Note the exponential (power of 2) growth in memory requests.

In the future, let's hope to see "distro/vendors" pre-configure cgroup priorities and limits (via systemd units) for important things like SSH and the graphical stack, such that they never get starved of memory.

JPvRiel

Posted 2011-02-22T01:02:49.820

Reputation: 871

You may be able to press Ctrl-z to suspend the program. Then you can do kill %1 (or whatever the job number is or you can use the PID).

You can use the ulimit command to try to limit the amount of memory available to a process.

Paused until further notice.

Posted 2011-02-22T01:02:49.820

Reputation: 86 075

Ctrl-Z is nice, but usually I am running a Matlab GUI and have lost track of the controlling terminal, so have no easy way to issue the Ctrl-Z keypress. It would be nice if the GUI had a hot key to send SIGSTOP to whatever application has focus! – nibot – 2011-02-22T01:21:19.033

You can run kill -STOP <pid> which will do the same thing as Ctrl-Z. – hlovdal – 2011-02-24T00:30:32.667

Yes, but the whole problem is that, in such a situation, the system is so non-responsive that it takes a long time (or forever) to get to a command prompt. – nibot – 2011-02-24T01:19:11.310

You can use CGroups to limit resource usage and prevent such problems: https://en.wikipedia.org/wiki/Cgroups

1kenthomas

Posted 2011-02-22T01:02:49.820

Reputation: 173

Please include the essential information in your answer and use the link just for attribution and further reading. That link describes what CGroups is, but it is not obvious from the link how to actually use it to solve the problem. Can you expand your answer to describe the solution to the question? Thanks. – fixer1234 – 2015-06-08T05:36:46.523

It would be nice if the GUI had a hot key to send SIGSTOP to whatever application has focus!

There is always the classical xkill command (from xorg-x11-apps-7.4-14.fc14.src.rpm on my system). I guess it should not be too difficult to make a clone that sends SIGSTOP instead of killing the target window.

hlovdal

Posted 2011-02-22T01:02:49.820

Reputation: 2 760

How can i make xkill start up quickly at the press of some key combination? – nibot – 2011-02-24T01:20:36.330

I am not sure. I assume both gnome and KDE have some global shortcut functionality that can be used to launch programs. – hlovdal – 2011-02-24T02:09:14.847

How do I quickly stop a process that is causing thrashing (due to excess memory allocation)?

Answers

Old school ulimit

Once off

Permanent protection

Newer CGroups