What are some ways to prevent user cron jobs from crushing the servers?

Question

It happens fairly frequently that user cron jobs on shared servers all run at the same time and get caught in contention (near as I've been able to tell). So load explodes, Nagios is angry, Apache stops responding, you can't SSH in because it times out, etc. I'm not in a position where I can just unilaterally decide users can't run crons, but I'd like to combat this issue where pgrep crond|wc -l returns >50.

It seems it should be possible to stagger them out by limiting the number of crond processes running at any given time or similar (like sending SIGSTOP until some of them clear up only less hacky), but I've yet to find any good leads.

The Hardware: 4 CPU and up, Low-end is Dell 1435s with ~8GB memory, RAID 10 WD EADS Mostly Plesk and cPanel, but also some evil Sphera systems.

How do you deal with this problem, sf?

score 5 · Answer 1 · answered Oct 16 '11 at 02:43

5

You can use cron.allow and cron.deny to limit user access to cron, or you can use PAM limits to limit CPU usage, number of processes and stuff like that. Aside from that, the solution is to create something to monitor and deal with cron jobs by users, because cron don't really has a limit on how many jobs to run.

I think CPanel has something on number of cron jobs running at the same time, but it's a specific tool (not sure).

answered Oct 16 '11 at 02:43

coredump

12,573
2
34
53

Ooh, thanks, I'll have to look into PAM limits. That might at least help mitigate the problem, if not solve it. – Wyatt Oct 16 '11 at 06:55

score 2 · Answer 2 · answered Oct 16 '11 at 03:26

I think you have one of those problems:

not enough memory to run the crontabs at the same time. You can fix by:
- adding more RAM
- limiting the maximum memory that a user can allocate
- reschedule the jobs to lower the number of concurrent jobs - you might need to replace crond and use a different scheduler
high I/O. You can fix by:
- lowering the I/O priority with ionice
- reschedule the jobs to lower the number of concurrent jobs

Try to find out if the machine is swapping, and if it is not swapping during the night, then change the cron I/O priority class to idle:

sudo ionice -c 3 -p $(pgrep cron)

It's definitely not memory; most of these boxes have about 28GB and none have less than 8. Using a different scheduler would be quite time consuming to test and evaluate. ionice had occurred to me but for two issues: 1) some of these are running kernels older than 2.6.13 and 2) it can't be something where we have to catch it in the act to fix it. (For one thing there are >100 servers) — Wyatt, Oct 16 '11 at 06:45

score 1 · Answer 3 · answered Oct 16 '11 at 14:59

I have always scheduled cronjobs at random times (especially minutes). I commonly see cron examples that run at midnight like:

0 0 * * *  /usr/bin/echo "Job ran"

If you have a lot of jobs defined like that you are asking for trouble. Unfortunately these are often long running system jobs. I also tend to schedule jobs at different hours throughout a batch process window. (23 to 05) hours.

I like the new cron specification used on Ubuntu. This has several /etc/cron.* directories to specify jobs to run. They get run in sequence rather than parallel limiting the load.

You should be able to see what is scheduled in the files located in /etc/spool/cron/crontabs. Reading these files will require root access. If it is users who are causing the problems discuss the problem with them.

You could also check /var/log/syslog for CRON entries to see what is being run when.

What are some ways to prevent user cron jobs from crushing the servers?

3 Answers3