Which is more efficient - tar or zip compression? What is the difference between tar and zip?



I'm working in Linux environment and want to know about tar and zip commands.

Which is more efficient - tar or zip? I also need to know the differences between the tar and zip commands. Can anyone explain them to me?


Posted 2010-08-09T12:32:27.150

Reputation: 801



tar only makes a single file out of multiple files, it doesn't do compression unless combined a compression program such as gzip or bzip2 (which you can call from within tar by using the -z or -j options, respectively). zip combines both the archiving and compression in one program.


Posted 2010-08-09T12:32:27.150

Reputation: 1 056



  • Assumes you'll be reading from one end to the other - "Tape ARchive". (The age of the command shows...)
  • Does not do compression, but you can compress the entire resulting stream by piping it through e.g. gzip and bzip2 (done internally with -z or -j)
  • Stores unix file attributes: uid, gid, permissions (most notably executable). The default may depend on your distribution, and can be toggled with options.


  • Stores MSDOS attributes. (Archive, Readonly, Hidden, System)
  • Compresses each file, then adds them to an archive
  • Includes a file table at the end of the file
  • and as a result of the former two, allows reading only the exact parts about the file you need.

The fact that zip compresses the files separately will impact compression ratios, particularly on many small similar files.

(At least this was exactly correct a decade ago.)


Posted 2010-08-09T12:32:27.150

Reputation: 701


Tar preserves much more metadata than Zip, see my comparison (it's slightly outdated):

enter image description here

(Click to zoom in)

Tar passes 65% of the tests, where Zip only passes 17%. I have made the test suite available on github under BSD license so you can try for yourself if you have Mac. For linux there I'm not sure if there are any metadata, so these tests may not be relevant.


Posted 2010-08-09T12:32:27.150

Reputation: 725

1Linux has metadata as well, so should work for it. – zeitue – 2016-12-20T07:22:03.430

Interesting! +1 for this. But then again, that was a huge program. Did you write this for other purpose? Just curious. – CppLearner – 2013-01-10T00:00:43.830

I wrote the tests for a file manager that I was working on some years ago. Never released it though. – neoneye – 2013-04-12T12:40:50.197


Efficiency can be measured in different ways:

  1. How long does the process take?
  2. How large are the resulting files?

There are other questions, too, like "How common are the tools to manipulate the resulting archives?"

So, for example, bzip2 creates smaller files than gzip, but it can take significantly longer. Also, in my experience gzip is universal on Unix-like systems, but bzip2 is still not (though it's very common and usually easy to get).


Posted 2010-08-09T12:32:27.150

Reputation: 5 695


7zip (http://www.7-zip.org/) is another good option for getting excellent compression at the expense of CPU time. Less common than bzip2 (not installed by default anywhere that I know of) but easy to install in most places (it is in the standard repositories for most Linux distributions and there is a simple installer package for Windows. Like tar+gzip it carries the compression window across input files so gets even greater savings over zip when including many small files.

– David Spillett – 2010-08-09T13:30:34.567

3Efficiency can also be measure by how well it preserves the data, see my answer to this question. Tar is much better than zip at preserving the data. – neoneye – 2010-08-09T22:40:19.547

1one more measurement coud be compatibility outside of UNIX. Windows is fine with zip (built in to Windows), can usually easily process tar.gz with shareware, but bzip2 is rare to find. Unfortunately Original Question didn't mention these criteria, so can't see if they're relevant. – Rich Homolka – 2010-08-10T03:30:45.173


I once did a thorough review of compression ratio versus time required for some common compressors, and which would be the most efficient depending on how you value space versus time: http://blog.grandtrunk.net/2004/07/practical-compressor-test/

– Wim – 2010-08-28T13:31:38.943


As Wim noted, tar itself doesn't compress. If you do add compress the tar (e.g. to get a .tar.gz or .tar.bz2), you're compressing the whole tar file at once. In contrast, zip compresses each file individually.

The efficiency depends on the workload. Specifically, zip allows you to access individual files directly. With tar, you have to first seek through the unwanted (compressed) files before. The compression performance depends on what you're compressing. tar with bzip2 is often better for a large number of similar files (e.g. a source directory). zip could be better if each file has very different content.

Matthew Flaschen

Posted 2010-08-09T12:32:27.150

Reputation: 2 370

4... on the other hand, you have to get the whole zip file before you can access the content, because the toc is placed at the end. in contrast, you can untar a tar as fast as the bytes arrive... – akira – 2010-08-09T13:19:45.677


Zip archives contain a central directory of their contents at the end (most likely to avoid having to create the directory beforehand, where you don't yet know what will be inside). This allows to quickly extract single files without having to unpack the whole archive: Just read the archive directory and extract only what is needed. However, this requires that the whole archive is accessible, and requires random access which is only available on block devices (floppy disks, hard drives). In addition, the archive directory is vulnerable: If the archive gets truncated for some reason, it requires heavy wizardry to extract anything useful from the archive.

Zip archives were created for BBS use, where it was important to be able to bundle the contents of a directory into one single (and compressed) file---instead of having to download possibly thousands of single files. Much like most web sites bundle their downloads even today, for the same reasons.

Tar archives were devised for bundling backups to be used for tape drives, hence for sequential access. There is no central directory; instead, the archive contains header blocks at regular intervals which indicate which files will follow in the next few blocks. Tar archives are intended to be read in one fell swoop; if only a single file is to be extracted, the archive is read sequentially, starting from the very beginning until the requested file is found (which may as well be at the very end). Compression is applied on top of that; each of the various compression programs that are applied to tar archives (compress, gzip, bzip2 etc.) are stream compressors and don't alter the sequential nature of the archive in any matter. In the worst case, you'd need slightly more blocks until you can start extracting.

This may sound like a trivial difference, but in fact represents a polar opposite in philosophy. With zip archives, there is always the need to have the entire file at hand to do anything useful with it, whereas a tar archive can be streamed to a pipeline. I can download a large tar archive and start extracting it right from the start, as soon as the first few blocks come in (and maybe interrupt the download as soon as I get the file I am looking for). For a Zip archive, I have to wait until the archive directory appears, which comes at the very end of the archive. But once I do have the entire file at hand, extracting partial contents from it will be much quicker from a tar file.

Both formats have one very strong point going for them, depending on where and how they are used. Since pipelines (and thus the notion of streaming data from one process to another) only really exist in the Unix world, the main advantare of tar archives is lost on other systems, which is why Zip archives are much more popular there. But tar archives are more flexible, which is why I prefer them whenever I have a choice.

Vucar Timnärakrul

Posted 2010-08-09T12:32:27.150

Reputation: 671


As the other already said, tar creates a large "block" of all the files that can be compressed with a stream comrpessor like gzip or bzip2.

The disadvantage of this is that you have to decompress the whole file to access one file inside the archive.

The advantage of this is that the compress ratio is usually higher, especially when the compressed files are very similar.

Other packer like "rar" have a "block mode" (or similar) to have the same effect.


Posted 2010-08-09T12:32:27.150

Reputation: 1 144