LInux OOM custom response

Question

I'm running Apache webserver and would like to improve a little bit how the OOM situation is handled.

I'm avare of the OOM scores and already did some customizations in that matter so when something bad happens, Linux is killing the correct processes. But its not enough.

The problem is that sometimes when OOM occurs the server gets overloaded and afterwards crashes and must be restarted. I would like to handle that without the full restart of the server. So I need to somehow "hook" a script on OOM killer invocation which would kill all apache (and its CGIs) processes, thus freeing the memory and start it (Apache) again.

I know this would work, because if OOM occurs and I'm fast enough to login to the server and kill the Apache manualy, everything is OK then.

FYI I'm running now nearly a hundred of those webservers, thats why I'm looking for fully automatic solution.

One possible solution would of course be to use some watchdog that would parse the syslog and detect OOMs in this manner - I already have something like that, which notifies about OOM killings by e-mail. This aproach can solve some situations but if the OOM is realy bad, the server is too much overloaded and my script does not even start (its run by cron). It can be improved by using inotify to watch the syslog or by piping the syslog directly (i.e. by fifo) to the script.

But still I'm wondering - isn't there any way how to "hook" the script directly to OOM killer? So i would put something like that in some /etc/.. file:

oom_action="sh /path/to/my/script.sh kill"

Or its simply not possible to do it like that?

I'm using Centos 6, Apache 2.2 and PHP as FastCGI.

Vladislav Rastrusny · Answer 1 · 2015-05-05T13:07:16.710

3

Why don't you just monitor apache processes and set their oom_adj value to 15 to be sure they will be the first to terminate on OOM? Here are some instructions about this setting.

Depending on your config you can either modify apache starting scripts or setup a simple cron task to do that.

You can also periodically watch the output of the command dmesg | grep -i oom. If there will be any lines, OOM killer killed someone since the server was booted last time. You can then clear the buffer with dmesg --clear

edited May 05 '15 at 13:07

answered May 05 '15 at 12:03

Vladislav Rastrusny

2,581
12
39
56

I already have the OOM-killer configured that way, so at first it kills PHP and HTTP. So in "normal" situation, it just kills few processes and everything is ok. But in critical situation its not enough an it is realy necessary to kill all HTTP and PHP processes (and then start Apache again). – dave May 05 '15 at 12:58
@dave Then you have to periodically watch the output of the command `dmesg | grep -i oom`. If there will be any lines, OOM killer killed someone since the server was booted last time. You can then clear the buffer with `dmesg --clear` – Vladislav Rastrusny May 05 '15 at 13:06

score 0 · Answer 2 · answered May 05 '15 at 13:34

I know there are horrendous PHP applications in the wild, lots of them, but isn't there something you could do on the Apache/FastCGI/PHP side of things? Apache constantly OOMing is not something you should encounter very often.

Try to lower maximum number of Apache processes and FastCGI handlers, and see if your current php.ini settings are too high for the maximum memory per script.

Also, it's perfectly possible to use ulimit with Apache and restrict the number of memory a process can use. That can help you before the server almost spirals to death.

OOM is very last resort, and everything that can lead to it should be inspected.

That is true and of course i have many tweaks on that matter. But also i have many servers with many different PHP applications, etc. I'm not saying that those situations happens on daily basis, but of course, they happen, from time to time. — dave, May 05 '15 at 14:24

dave · Accepted Answer · 2015-05-05T14:25:47.310

0

As i didnt find any better solution, i have implemented the watchdog to read kernel syslog messages (via fifo):

/etc/rsyslog.d:

(...)
kern.err         |/path/to/fifo

and search them for OOM-killer activity:

while read /path/to/fifo; do (..)

and when the OOM-killer spawns massively (i realy need to check only for emergency situation), i kill (and then start) Apache.

edited May 05 '15 at 14:25

answered May 05 '15 at 14:20

dave

215
1
4
11

score 0 · Answer 4 · answered May 05 '15 at 16:28

I think is better you put your process in cgroup memory subset and use the release_agent to call an external script when out memory happen

notify_on_release
    contains a Boolean value, 1 or 0, that either enables or disables the execution of the release agent. If the notify_on_release is enabled, the kernel executes the contents of the release_agent file when a cgroup no longer contains any tasks (that is, the cgroup's tasks file contained some PIDs and those PIDs were removed, leaving the file empty). A path to the empty cgroup is provided as an argument to the release agent. 

release_agent (present in the root cgroup only)
    contains a command to be executed when a “notify on release” is triggered. Once a cgroup is emptied of all processes, and the notify_on_release flag is enabled, the kernel runs the command in the release_agent file and supplies it with a relative path (relative to the root cgroup) to the emptied cgroup as an argument. The release agent can be used, for example, to automatically remove empty cgroups

Using cgroup you can control how much resource you process can uses and don't put your server high overload

Using cgroup you can control the server resources

Well, cgroups sounds nice. I definetly need to look closer on them (because i have almost no expirience with them). Didn't know they are capable of something like that (the release_agent). On the other hand, if i set "strict" boundaries on how much memory can be used by some processes, i could in some scenarios "waste" memory, that is not used by other services (e.g. mysql). For realy good use of that, it would require a lot of fine-tunning and it would be much harder to maintain on many servers, i believe. — dave, May 06 '15 at 11:20

LInux OOM custom response

4 Answers4