My server runs a number of cron jobs at midnight. Each job creates a backup of something, by creating a tarball and compressing it with xz
.
Since xz
is a CPU and memory pig, I added a random delay to each job, so they "should not" clobber each other. But from time to time that does happen, and heavily loads the server.
Assumptions:
- Based on my traffic, midnight is best time to do backups - but there still is traffic (which is why I want to avoid excessive load)
- Each public-facing app is associated with its own backup job, and these are decoupled (they don't know of each other) - so I can't merge the backup cron jobs into a single job, as I need that granularity
- I can't hardcode the starting time for each one, as that would increase maintenance - to add an app to the server (via ansible), I just deploy it and drop a backup cron job (scheduled for midnight) into
/etc/cron.d/
, and the random delay before the job starts usually is good enough - I throttle the jobs a bit via
tar ... | pv --rate-limit ... | xz ...
- but although that reduces load per-job, it also slows down every job and so increases the probability of multiple jobs running concurrently (which when added together may eat 100% cpu)
A possible solution is for each job to create a temporary file that signals it is busy, then delete it afterwards. The problem is if a job detects this file, what does it do? Sleep? For how long? I could make it sleep for a random period using at
, but if something goes wrong with my backup scripts I could have a huge queue of jobs competing with each other. Another maintenance headache.
So, how does one typically solve this problem? Basically, a simple way to schedule related cron jobs, without letting them clobber each other, and without the need for fine-tuning starting times.