I am using tar to backup a linux server to tape. I am using the -j option to compress the file with bzip2, however I can't see a way to adjust the block size options for bzip2 from tar. The default block size is 900,000 bytes which gives the best compression but is the slowest. I am not that bothered about the compression ratio, so am looking to make bzip2 run faster with a smaller block size.

Guy C
  • 505
  • 1
  • 4
  • 9
  • 1
    Sidenote: Lately I've all but given up on bzip2. I use lzma (from the lzma, lzma-utils, or lzma-sdk package, name depends on your distribution.) It usually compresses the same or better than bzip2 given the same CPU time - and when it comes to decompression it simply blows bzip2 away. – Mihai Limbăşan May 02 '09 at 19:53

5 Answers5

export BZIP=--fast
tar cjf foo.tar.bz2 foo

Or pipe the output of tar to bzip2.

Though you should note from the bzip2 man page:

    -1 (or --fast) to -9 (or --best)
              Set  the  block size to 100 k, 200 k ..  900 k when compressing.
              Has no effect when decompressing.  See MEMORY MANAGEMENT  below.
              The --fast and --best aliases are primarily for GNU gzip compat-
              ibility.  In particular, --fast  doesn't  make  things  signifi-
              cantly faster.  And --best merely selects the default behaviour.
Brian Campbell
  • 377
  • 3
  • 8
tar -cjf dir.tar.bz2 --options bzip2:compression-level=9 path/to/dir/
  • 72,524
  • 21
  • 127
  • 192
  • 141
  • 1
  • On my system (OSX El Capitan bsdtar 2.8.3) this is missing from the man page (although gzip:compression-level and xz:compression-level are listed), but testing it the option does work. – steveayre Sep 05 '16 at 16:20
  • `tar: unrecognized option '--options'` – ZN13 Jul 18 '18 at 18:40

bzip2 block sizes

bzip2 has some block size options. From the manual page bzip2(1):

-1 (or --fast) to -9 (or --best)
       Set the block size to 100 k, 200 k ..  900 k when compressing.
       Has no effect when decompressing. See MEMORY MANAGEMENT below.
       The --fast and --best aliases are primarily for GNU gzip
       compatibility. In particular, --fast doesn't make things
       significantly faster. And --best merely selects the default

As you want faster compression with less regards to compression ratio, using bzip2, you seem to want the -1 (or --fast) option.

Setting bzip2 block size when using tar

You can set bzip2 block size when using tar in a couple of ways.

The UNlX way

My favorite way, the UNlX way, is one where you use every tool independently, and combine them through pipes.

$ tar --create [FILE...] | bzip2 -1 > [ARCHIVE].tar.bz2

You can read that as "create .tar with tar -> bzip it with bzip2 -> write it to [ARCHIVE].tar.bz2".

Environment variable

It is also possible to set bzip2 options through the environment variable BZIP2. From the manual page bzip2(1):

bzip2 will read arguments from the environment variables BZIP2 and BZIP,
in that order, and will process them before any arguments read from the
command line. This gives a convenient way to supply default arguments.

So to use that with tar, you could for example do:

$ BZIP2=-1 tar --create --bzip2 --file [ARCHIVE].tar.bz2 [FILE...]

Faster alternatives

bzip2 uses a slow compression algorithm. If you are concerned about speed, you could investigate alternative algorithms, such as those used by gzip or lzop. Here is a nice article comparing compression tools: https://aliver.wordpress.com/2010/06/22/huge-unix-file-compresser-shootout-with-tons-of-datagraphs/

  • 191
  • 1
  • 4
  • It looks like you may have the knowledge to provide good Answer here, but please consider reading [How do I write a good Answer?](http://serverfault.com/help/how-to-answer) in our help center and then revise the Answer. Your Commands/Code/Settings may technically be the solution but some explanation is welcome. Thanks in advance. – HBruijn Nov 16 '16 at 11:33

Send the tar output to stdout and then pipe it through bzip2 separately:

% tar cvf - _file_ | bzip2 _opts_ > output.tar.bz2
  • 20,901
  • 3
  • 48
  • 81

Its even easier:

% tar -cvf dir.tar path/to/dir/ && bzip2 -9 dir.tar
  • 4
    Using a temporary file means you need enough hard disk space, plus bandwidth for tar to write and bzip2 to read it. This may seem trivial for small amounts of data, but when the directory in question has several hundred gigabytes, it may become a real problem. – Ansgar Esztermann Jan 10 '13 at 10:51
  • Yes, thanks. I know learned the deeper reason why `tar` has `-z` and `-j`. These options seemed rather convenient to me. But they can save the day. – Andreas Spindler Jul 04 '15 at 09:54