How to safeguard a shell script against running out of control?

Question

I've recently had an experience of writing a shell script which crashed a server (and damaged a partition) by consuming all resources. It was hooked up to a cron job, and it seems it took longer to run than the interval between executions, over time snowballing out of control.

Now, I've since modified it to record its running state, and to not run more than once simultaneously. My question is: are there other, simple ways to safeguard a script against causing harm? Is there a standard list of things a script should do to behave properly, not consume too many resources, to fail gracefully, alert the right people, etc?

Basically: what other pitfalls should I avoid?

score 4 · Answer 1 · answered Jan 23 '12 at 07:33

Computers do exactly what they are told. The only way to ensure that a script "behaves properly" is to write it so that it will behave properly, under all scenarios.

Some basic advice:

Implement some kind of monitoring system.
The fact that your system blew up without you knowing it was coming tells me you either do not have a monitoring system, or your current system isn't good enough.
Invest some time in making sure that your servers tell you that there's a problem before they fall over.
Include appropriate safeguards in scripts run from cron.
Your script stepped on its own tail. That shouldn't happen.
You've learned the hard way that you need to guard against this sort of thing (and have the system notify you if it happens).
Design and Test More Thoroughly.
Carefully evaluate every script you are going to deploy to make sure it won't produce undesirable side effects. If you can imagine a failure scenario, test for it (and handle it properly!).
Take the time to simulate failures (either by hard-coding the condition to true in your script, or by generating the circumstances to test your detection logic.

Ok - but my question is "what safeguards should I implement", and your answer 2 is "implement appropriate safeguards". Is there a list? Best practices? Patterns? — Steve Bennett, Jan 23 '12 at 08:14
No. The appropriate safeguards vary depending on what you are doing. There is no one-size-fits-all solution - You need to analyze your particular situation and act accordingly. Providing an exhaustive list would require solving [the Halting Problem](http://en.wikipedia.org/wiki/Halting_problem). — voretaq7, Jan 23 '12 at 09:04

score 0 · Answer 2 · answered Jan 23 '12 at 08:32

0

The safeguards you are talking about depend on what your script is doing. For example, it will be better to backup some important file before modifying it in an automatic way. If the script fails in some way and corrupted this important file, you are safe because you have backup and so on.

One important thing to mention is logging, logging, and logging. If your script is running in the background without a log file showing its progress and what is doing, you will have no idea about any potential problem in the near or far future. Don't forget to include time stamp of each log entry and enable NTP service to know exactly at what time this happened.

answered Jan 23 '12 at 08:32

Khaled

35,688
8
69
98

The problem I'm having with logging is ending up with a massive, highly verbose log that is very hard to skim. – Steve Bennett Jan 23 '12 at 11:29
You can set several levels of logging. Under normal conditions, you should log only the errors and warnings and some info. Logrotate can help you managing your logs as well to avoid very huge log files. – Khaled Jan 23 '12 at 11:40

score 0 · Answer 3 · answered Feb 14 '12 at 04:52

0

In the end, we now run the script inside a VM. That vastly limits the scope of damage that can be caused.

The frightening thing about Linux (for me, at least) is that minor typos or bugs can have devastating effects. Even something like running a command with a ${VARIABLE} can have a totally different (and destructive) meaning if that variable is blank, or contains a space.

answered Feb 14 '12 at 04:52

Steve Bennett

5,539
12
45
57

1

Hate to break it to you, but you can break Windows or any other O/S just as easily with a bit of careless scripting as the administrator user. The key is to use the least amount of permissions necessary to do the job. – Magellan Feb 14 '12 at 05:14
Entirely plausible - but not relevant. And actually I should replace "Linux" with "Bash" in the above. It's the endless text munging, different escaping mechanisms, substitutions etc that are so error-prone. – Steve Bennett Feb 14 '12 at 06:23
Well, yes. Bash is much more an art than a science even on a good day. – Magellan Feb 14 '12 at 15:53

How to safeguard a shell script against running out of control?

3 Answers3