What is the fastest compression method for a large number of files?



I need to compress a directory with around 350,000 fairly small files that amount to about 100GB total. I am using OSX and am currently using the standard "Compress" tool that converts this directory into a .zip file. Is there a faster way to do this?


Posted 2011-06-19T04:55:36.570

Reputation: 243

@DanielBeck, Problem with tar is that they don't show the directory tree. So to even get a "view", we need to unzip that whole tar. Are there alternatives to tar that shows directory view? – Pacerier – 2015-05-16T22:33:09.920

You probably cannot beat tar, as it doesn't actually compress, only archive, without specific options that enable that. In answers, I'd love to see proof, no opinion... – Daniel Beck – 2011-06-19T05:28:51.487

1Depends how much compression you want. – ta.speot.is – 2011-06-19T06:49:15.807

1I did end up using tar and for speed reasons did not try compressing it yet. It was able to complete in time for what I needed it for. Thanks! – Spike – 2011-06-20T03:16:21.770



For directories I'd use a tar piped to bzip2 with max-compression.

a simple way to go is,

tar cfj archive.tar.bz2 dir-to-be-archived/ 

This works great if you don't intend to fetch small sets of files out of the archive
and are just planning to extract the whole thing whenever/wherever required.
Yet, if you do want to get a small set of files out, its not too bad.

I prefer to call such archives filename.tar.bz2 and extract with the 'xfj' option.

The max-compression pipe looks like this,

tar cf - dir-to-be-archived/ | bzip2 -9 - > archive.tar.bz2  
#      ^pipe tarball from here to zip-in^ into the archive file. 

Note: the 'bzip2' method and more compression tends to be slower than regular gzip from 'tar cfz'.

If you have a fast network and the archive is going to be placed on a different machine,
you can speed up with a pipe across the network (effectively using two machines together).

tar cf - dir/ | ssh user@server "bzip2 -9 - > /target-path/archive.tar.bz2"  
#      ^ pipe tarball over network to zip ^ and archive on remote machine.

Some references,

  1. Linux Journal: Compression Tools Compared, Jul 28, 2005
  2. gzip vs. bzip2, Aug 26, 2003
  3. A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA, May 31, 2005


Posted 2011-06-19T04:55:36.570

Reputation: 50 788

2The questioner asked for the fastest method, bzipping a 100Gb tar would take a lifetime! There comes a point with disk space being so cheap that taking aeons to squeeze out every last possible bit of redundancy is just a senseless waste of resources, unless absolutely necessary. With most of the disk usage taken up in slack space, gzipping the tar with -1 would probably do the job well enough and allow moving onto the next task a few months earlier! – Andy Lee Robinson – 2011-07-30T12:11:46.043

While I agree that a 100GB file is probably not worth compressing in totality, I don't think that bzip2 will take linearly more time for 100GB as compared to 1GB (say). Would love to see some theory or data to show either ways. – nik – 2011-07-30T16:57:48.543

I understand that bzip2's dictionary is adaptive, therefore it is constantly looking for new redundancies within its search window up to the end of the file. Subject to the homogeneity of the file's entropy, it should be relatively linear. It would be a bad compressor that assumed it had all it needed from the beginning of file to be able to compress the rest quickly, but in some cases that may be all that is needed, though there are better ways to grow old than work it out empirically with 100GB datasets! – Andy Lee Robinson – 2011-07-31T02:40:32.347


This guy did some research on that. It appears that .zip will compress larger files faster. However, it yields one of the largest compression sizes. It also looks like he was using Windows utilities, but I'm betting OSX's utility is almost as optimized.

Here is an excellent website where numerous compression utilities have been benchmarked for speed over many files. There are many other tests on that site you could look at to determine the best utility for you.

Much of the speed has to do with the program you use. I've used 7zip's utility for Windows, and I find that to be very fast. However, compressing many files takes a long time no matter what so I would just let it go overnight. Or you could just tar the whole thing and not compress it...Personally I hate unzipping large archives so I would be careful if that's what you want to do.


Posted 2011-06-19T04:55:36.570

Reputation: 308


I prefer using

tar cf - dir-to-be-archived/ | bzip2 -9 - > archive.tar.bz2

for moving files to other server and coverting them at the same time

oussama fahd

Posted 2011-06-19T04:55:36.570

Reputation: 1

1Which is already suggested in the top answer by @nik . No need to duplicate for emphasis, just upvote the other answer or add a comment if you've something substantive but don't want to give an involved answer. ;o) – pbhj – 2018-11-21T16:40:05.560