How to obtain maximum compression with .tar.gz?

67

31

The way i understand the use of tar + gzip is that tar is normally used to consolidate a grouping of files into a single file, then gzip is used to compress that file.

I recently learned that tar can also compress.

Because I do not fully understand how compression works @ it's core, I have (possibly ridiculous) concerns that sending a pre-compressed .tar to gzip might prevent gzip from compressing as well as it's potential would allow and things of that nature.

My question is essentially: What combination of args/compression methods should i use to create the absolute smallest tar.gz, and what does the command line statement look like for that?

Mario Zigliotto

Posted 2012-12-01T20:47:48.430

Reputation: 1 181

Question was closed 2014-10-22T14:14:59.460

What @Keltari said. Compression rates and ratios are highly dependent on what it is you are compressing, which is also why there are different compression algorithms and methods. – music2myear – 2014-10-21T17:13:25.267

2Compressing already compressed files may reduce their size, or it may make the archive bigger. It all depends on the type of data and any compression being used. – Keltari – 2013-01-31T19:02:05.533

Answers

118

Or, you can tell tar to user maximum compression this way:

export GZIP=-9
tar cvzf file.tar.gz /path/to/directory

Additionally, to keep your envvars clutter-free, you can do this:

env GZIP=-9 tar cvzf file.tar.gz /path/to/directory

Brian Fane

Posted 2012-12-01T20:47:48.430

Reputation: 1 196

44

As you stated- "tar can also compress", implies that - tar does not always compress data by itself.

It does so only when used with the z option. That too not by itself, but - by passing the tarred data through gzip.

However instead, as noted in this answer, you can pipe the two commands: tar & gzip such that you can explicitly specify compression level for the gzip command to achieve smallest output size.

tar cvf - /path/to/directory | gzip -9 - > file.tar.gz

Here 9 specifies maximum possible compression level.

Ujjwal Singh

Posted 2012-12-01T20:47:48.430

Reputation: 1 550

I had an issue where its not recursive, and complains that it will be an empty archive, since the command is split, its hard to find how to properly force recursive, since its already tar default.

MY BAD, I had incorrectly specified it starting like this tar -cvf /path – Brian Thomas – 2017-12-06T21:52:26.943

17

Usually neither gzip nor tar can create "the absolute smallest tar.gz". There are many compression utilities that can compress to the gz format. I have written a bash script "gz99" to try gzip, 7z and advdef to get the smallest file. To use this to create the smallest possible file run:

tar c path/to/data | gz99 file.gz

The advdef utility from AdvanceCOMP usually gives the smallest file, but is also buggy (the gz99 utility checks that it hasn't corrupted the file before accepting the output of advdef). To use advdef directly, create file.tar.gz however you feel like. Then run:

advdef -z -4 file.tar.gz

This will create a standard gz file that can be read by gzip and tar as normal, just a tiny bit smaller. This is about the best you can do with the gz format.

Since you only recently learnt that tar can compress, and didn't say why you wanted the the smallest ".tar.gz" file, you may be unaware that there are more efficient formats can be used with tar files, such as xz. Generally, switching to a different format can give a vastly better improvement in compression than fiddling round with gzip options. The main disadvantage of xz is that it isn't as common as gzip so the people you send the file to might have to install a new package. It also tends to be a bit slower, particularly when compressing. If this doesn't matter to you, and you really want the smallest tar file, try:

 tar cv path/to/data | xz -9 > file.tar.xz

Modern versions of tar, for example on Ubuntu 13.10, automatically detect compressed files. So even if you use xz compression you can still decompress as usual:

 tar xvf file.tar.xz

To give a quick idea how these compression utilities compare, consider the effect of compressing patch-3.1.1 from the linux kernel:

utility         cpu    format  size(bytes)
gzip -9         0.02s  gz      105,628
advdef -2       0.07s  gz      102,619
7z -mx=9 -tgzip 0.42s  gz      102,297
advdef -3       0.55s  gz      102,290
advdef -4       0.75s  gz      101,956
xz -9           0.03s  xz       91,064
xz -3e          0.15s  xz       90,996

In this trivial example, we see that to get the smallest gz we need advdef (though 7z -tgzip is almost as good and a lot less buggy). We also see that switching to xz gains us much more space than trying to squeeze the most out of the old gz format, without compression taking too long.

gmatht

Posted 2012-12-01T20:47:48.430

Reputation: 2 084

I don't think "buggy" and "archive" should ever be used together, what use is an archive that's corrupt? You need a much larger file to "compare" the compression utilities, and different types of input files too - measuring in hundredths of a second differences isn't that reliable, I think xz -9 usually takes something like 5x the gz -9 time, not just 1.5x as your table suggests. – Xen2050 – 2017-03-18T16:55:58.233

how can we create split archives (while compressing) using the xz process please – nyxee – 2017-08-28T20:54:29.730

The OP asked for how to get the most compression for a .tar.gz file, but you suggested creating a .tar.xz file. You are answering a different question than asked. – ChrisInEdmonton – 2014-03-10T16:08:05.177

Ah, I see what you are going for. advdef just crashes on my system (v1.15), so 'advdef -z -4 file.tar.gz' doesn't work, but it at least theoretically could. I can't find evidence that it would shrink the file further than 'gzip -9', but it might, and in any case is enough for me to remove my -1 vote. Thanks for clarifying! – ChrisInEdmonton – 2014-03-10T17:29:54.077

Hmm, I'm using v1.17. Anyway the pedantic mathematician in me wants to point out that my answer arguably isn't technically correct. After all, if you enumerate all possible gz files from shortest to longest and pick the first one that decompresses to the right file, you could shave yet a few more bytes off. But that'd be way too slow in practice. – gmatht – 2014-03-10T17:46:00.420

6

tar c /path/to/data | gzip --best > file.tar.gz

gzip option --best (equivalent to -9) asks for the highest compression level.

carlito

Posted 2012-12-01T20:47:48.430

Reputation: 521

4Alternatively, use --best flag: -9 is confusing to reader. – om-nom-nom – 2014-02-20T12:04:43.427