I do large backups that put huge strains on my server when using tar/gzip. I've got the task setup as a cronjob which accesses my script that handles the backup. I know that nice might be able to possibly help in this situation, but I'm a bit uncertain of the proper way to use it.

I've got the following commands within my script:

tar -cf 
gzip -9 

Would I just add the nice command in front of it like so to reduce the priority?:

nice -n 13 tar -cf 
nice -n 13 gzip -9

Are there any caveats to using this approach? thanks.

Joe Habadas
  It can't make anything worse so it's safe to try and see.
  I don't see any problem. But why don't you write a script and call it with `nice`? You will schedule all the backup process. Check `ionice` command too.
  I was thinking of that actually. I'm unfamiliar eith `ionice`, so I'll have to check it out. The script gets called by a cronjob - so would I just append that within crontab? for example: `30 22 * * * /bin/sh nice -n 13 myscript.sh`. I'm also not sure, but if I did this within the script wouldn't the second `nice` affect the first one? meaning that gzip is a sub-process of tar?
  Yes. If your script does `tar` and `gzip` after, the `nice` will affect the script and all child process.
  [I noticed elsewhere](http://stackoverflow.com/questions/14371576/nice-command-in-sh-script-for-cron-jobs) others had suggested `* * * * * /usr/bin/nice -n 13 ...` - so would I not use `/usr/bin/sh`?

There are caveats to pay attention to. Since the question doesn't specify an exact OS (but implies it is some Unix like OS), the list of caveats will depend on specific OS and version. The most important to keep in mind are:

nice is intended to affect how much CPU time is given to a process, but not how much RAM or I/O capacity. Thus instead of the intended effect other possible outcomes include:

  • The backup takes longer time to complete due to being given less CPU time. But it will use just as much RAM as it used to and now it will use that RAM for longer time. The system is slowed down due to having less RAM for other purposes, and this slowness will last longer time than it used to.
  • The use of nice has no effect at all, because the backup process was I/O bound to begin with, and the I/O scheduling is unaffected by nice. If the OS happens to be a recent Linux version, the I/O scheduling may or may not be affected by nice depending on which ionice setting is being used.

Moreover even the exact effect on CPU scheduling depend a lot on the specific operating system and settings. Some kernels have settings which will allow a process to run at a higher or lower priority than those reachable by using the nice command.

One caveat that I have run into myself appears to be specific to Ubuntu 14.04. In the default configuration it groups processes for scheduling purposes. Each group then receives a fair share of CPU time. nice only affects how CPU time is allocated to processes within such a group, but not how much is allocated to each group. For me that completely undermined the use of nice, because a low priority process could still take away CPU time from processes in different groups.

    "... and the IO scheduling is unaffected by `nice`." I would suggest you read the `ionice` man page since that actually isn't the case by default.
  It's really just the `CPU` which is affected when my script hits the `tar/gzip`. In turn that creates bottlenecks for moments that render the server to a mere crawl. You bring up great points though about the the caveats and I can see that it's really a balancing act and often there's a fine line between making something better, or possibly worse.
  Maybe you can use CGroup. You can have a good control of resources for your process. Good documentation you can get here: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.html
  What you mention is OS specific. However the question doesn't state which OS is being used. What I said is that it is a possible outcome, and I'll stand by that statement even if it is not possible on every version of every OS that the question might apply to.

I'd take a different approach...

No, I wouldn't mess around with nice for this. And gzip isn't that great. Plus, you're using gzip -9 which gives the greatest compression rates at the expense of CPU. Do you really need that level of compression over the default (level 6)?

Does your system get strained as much if you don't use gzip level 9?

What are the specifications of your server? How many and what type of CPUs do you have? cat /proc/cpuinfo

If you have multiple CPUs, would you consider using pigz instead? It's multithreaded, a bit more efficient and can leverage the resources on your system much better.

Some tests with a 1.8GB file:

Standard gzip (-6 compression level)

Original file size: 1.8G    CHL0001.TXT 
Compression time: 0m18.335s
Compressed file size: 85M   CHL0001.TXT.gz
Decompression time: 0m6.300s

gzip -9 (highest compression)

Original file size: 1.8G    CHL0001.TXT
Compression time: 1m29.432s
Compressed file size: 75M   CHL0001.TXT.gz
Decompression time: 0m6.325s

pigz (-6 compression level)

Original file size: 1.8G    CHL0001.TXT
Compression time: 0m1.878s
Compressed file size: 85M   CHL0001.TXT.gz
Decompression time: 0m2.506s

pigz -9 (highest compression, multithreaded)

Original file size: 1.8G    CHL0001.TXT
Compression time: 0m5.611s
Compressed file size: 76M   CHL0001.TXT.gz
Decompression time: 0m2.489s

Conclusion: Is the extra bit of compression worth the vastly longer time spent compressing the data?

  Interesting results. The reason for using the high compression is that I'm backing up sql db's, some of which are over `20 GB`. I haven't tried a lessor compression because disk space plays a bit of a factor. I'll have to compare the results though of using a lessor compression vs. disk and cpu cost. Unfortunately since the size of the data is so large it's something that's not quick to test. I've never heard of `pigz`, so thank you for the recommendation - I'm always open to knowing about alternatives :)
    Parallelizing the compression won't help if the box is CPU-bound.

I realize that this is straying from the original question, but it is staying on the theme of efficiency (you mention "huge strains on my server")...

I'm inferring (or guessing!) from what you've posted that you are creating a tar containing a set of files and then gzip-ing the result. You could save yourself a lot of disk I/O (and temporary space requirement) by piping one directly into the other:

tar cf - /path/to/stuff | gzip > archive.tar.gz

You may find that makes a significant difference to the total elapsed time.

