3

I've got a few servers that have begun oom-killing their backup processes and, while I understand that encountering the oom condition is quite bad in itself, I need this process to not die so that backups happen properly while the memory issue is addressed.

To that end I've attempted to create way to launch processes with adjusted oom_scores in a way similar to launching a process with nice.

#!/bin/bash

function oom_adj_exec() {
    while getopts ':n:' opt; do
        case $opt in
            n)
                if grep -q '^-\?[0-9]\+$' <(echo "$OPTARG"); then
                    if [ "$OPTARG" -ge -1000 -a "$OPTARG" -le 1000 ]; then
                        oom_score_adjust=$OPTARG
                    else
                        echo "Acceptable values for -n are from -1000 to 1000" >&2
                        return 255
                    fi
                else
                    echo "Improper format for -n: $OPTARG" >&2
                    return 255
                fi
                break
                ;;
            :)
                echo "option -$OPTARG requires a value" >&2
                return 255
                ;;
            *)
                echo "Unknown option -$opt" >&2
                return 255
                ;;
        esac
    done

    command=${@:$OPTIND}

    # job control requires the monitor option which
    # is usually not set for non-interactive shells
    prev_state=$(set +o | grep monitor)
    set -o monitor

    $command &
    pid=$!

    echo "$oom_score_adjust" > /proc/$pid/oom_score_adj

    fg %% > /dev/null

    ecode=$?

    # restore the previous state of the shell
    $prev_state

    return $ecode
}

oom_adj_exec $@

Example usage:

./oom_adj_exec.sh -n -500 /usr/bin/mem_bloater

While it seems to work I can't shake the feeling like there's something waiting in there to go horribly wrong. Is there anything that stands out as being a truly terrible idea and/or disaster waiting to happen?

Sammitch
  • 2,072
  • 1
  • 20
  • 34
  • 1
    The acceptable values for [`oom_score_adj`](https://www.kernel.org/doc/Documentation/filesystems/proc.txt) range from -1000 to +1000 so you might replace the test `grep -q '^-\?[0-9]\+$' <(echo "$OPTARG")` with `[ "$OPTARG" -ge -1000 -a "$OPTARG" -le 1000 ]` but otherwise nothing immediately makes my eyes bleed. – HBruijn Jun 29 '16 at 18:41
  • 1
    This is true, but I also want to explicitly test for non-digit characters so I've added this as an additional check. Thanks! – Sammitch Jun 29 '16 at 18:51
  • IIRC `-ge` and `-le` will already trow a failure when used to test non-integers so 2 birds with one stone... – HBruijn Jun 29 '16 at 18:58
  • True, but not gracefully. I'm a stickler. :P – Sammitch Jun 29 '16 at 19:16

1 Answers1

2

I've also done this but not quite as nicely, like so:

(echo 1000 > /proc/self/oom_score_adj && exec /usr/bin/blah)

Because it's in parentheses, it launches a subshell, sets the OOM score for the shell (in this case to 1000, to make it extremely likely to get killed in an OOM situation), and then the exec replaces the subshell with the intended program while leaving the new OOM score intact. It also won't affect the OOM score of the parent process/shell, as everything is happening inside the subshell.

Malvineous
  • 955
  • 7
  • 27