1

My server runs a number of cron jobs at midnight. Each job creates a backup of something, by creating a tarball and compressing it with xz.

Since xz is a CPU and memory pig, I added a random delay to each job, so they "should not" clobber each other. But from time to time that does happen, and heavily loads the server.

Assumptions:

  • Based on my traffic, midnight is best time to do backups - but there still is traffic (which is why I want to avoid excessive load)
  • Each public-facing app is associated with its own backup job, and these are decoupled (they don't know of each other) - so I can't merge the backup cron jobs into a single job, as I need that granularity
  • I can't hardcode the starting time for each one, as that would increase maintenance - to add an app to the server (via ansible), I just deploy it and drop a backup cron job (scheduled for midnight) into /etc/cron.d/, and the random delay before the job starts usually is good enough
  • I throttle the jobs a bit via tar ... | pv --rate-limit ... | xz ... - but although that reduces load per-job, it also slows down every job and so increases the probability of multiple jobs running concurrently (which when added together may eat 100% cpu)

A possible solution is for each job to create a temporary file that signals it is busy, then delete it afterwards. The problem is if a job detects this file, what does it do? Sleep? For how long? I could make it sleep for a random period using at, but if something goes wrong with my backup scripts I could have a huge queue of jobs competing with each other. Another maintenance headache.

So, how does one typically solve this problem? Basically, a simple way to schedule related cron jobs, without letting them clobber each other, and without the need for fine-tuning starting times.

lonix
  • 757
  • 9
  • 20
  • @GeraldSchneider Thanks, no I don't think so. I already covered a lock file scenario, and unless I'm mistaken, it doesn't solve the problem. These are different cron jobs competing with each other, not the same cron job. – lonix Mar 31 '20 at 11:52
  • It is possible for different jobs to try to get the same lock. The issue then becomes how to reduce contention on that, and how to distribute one job per CPU. – John Mahowald Mar 31 '20 at 11:58

3 Answers3

2

Use shell operators, e.g., to run command1 then command2 at midnight, regardless of former's output, use:

0 0 * * * command1 ; command2

Alternatively, you can run command2 only if command1 successfully completes (returns with exit status zero):

0 0 * * * command1 && command2

The latter is perhaps more useful when failure of command1 is likely to signify an underlying fault precluding success of command2.

user2768
  • 41
  • 5
  • Thanks. This is a good way for simple cases, but when you need the jobs to be logically separated this is not scalable (needs edits and maintenance every time you add/remove an app from the server). – lonix Mar 31 '20 at 12:35
  • @Ionix Just put `command1 ; command2` in a script and replace `command1 ; command2` with the path to that script. – user2768 Mar 31 '20 at 12:37
1

Randomly distributing start times is good for avoiding peak on-the-hour times, and is easy to do with Ansible. But does not really ensure the resources will be available to sustain several concurrent compress jobs. Several methods exist of how to take low impact backups, consider some or all of them.

Run your list of commands through a program that throttles based on CPU. For example, GNU parallel --limit 100% will only run if the load average is below the number of CPUs.

Each job attempts to acquire one of a small number of locks. Such as with flock from util-linux, Python, or Perl. Seems simple, but maintaining a number of them will be annoying. I consider a wrapper command with built in job management more robust, like GNU parallel.

Evaluate your compression algorithm. zstd is modern and fast, for just a bit more memory.

Spread backup jobs over more hours. Think about whether say 00:00 to 03:00 is acceptable for your performance and backup requirements.

Add CPU. It can be expensive to size for peak capacity, but it allows more compression threads.

Offload backups to another host entirely. Take a storage array or cloud based snapshot of disks. Present to a different host. Backup from there.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32
  • 1
    Added lock file in an edit. I expect that one to be a bit messy to maintain for a multi threaded thing. See the linked duplicate for how flock can wrap one job easily. – John Mahowald Mar 31 '20 at 12:28
  • Thanks again. I had another idea as well, take a look at my separate answer below... what do you think? – lonix Mar 31 '20 at 12:29
1

Take a look at @JohnMahowald's answer for an excellent list of options, including cleverly handling contention.

What I decided to do was instead of adding backups jobs to /etc/cron.d, I'll add them to a custom cron directory, e.g. /etc/cron.backupjobs/.

Then I'll add a "master" job to /etc/cron.d/ which runs jobs in /etc/cron.backupjobs/ sequentially.

lonix
  • 757
  • 9
  • 20