Usually neither gzip nor tar can create "the absolute smallest tar.gz". There are many compression utilities that can compress to the gz format. I have written a bash script "gz99" to try gzip
, 7z
and advdef
to get the smallest file. To use this to create the smallest possible file run:
tar c path/to/data | gz99 file.gz
The advdef
utility from AdvanceCOMP usually gives the smallest file, but is also buggy (the gz99
utility checks that it hasn't corrupted the file before accepting the output of advdef
). To use advdef
directly, create file.tar.gz however you feel like. Then run:
advdef -z -4 file.tar.gz
This will create a standard gz file that can be read by gzip and tar as normal, just a tiny bit smaller. This is about the best you can do with the gz format.
Since you only recently learnt that tar can compress, and didn't say why you wanted the the smallest ".tar.gz" file, you may be unaware that there are more efficient formats can be used with tar files, such as xz. Generally, switching to a different format can give a vastly better improvement in compression than fiddling round with gzip options. The main disadvantage of xz is that it isn't as common as gzip so the people you send the file to might have to install a new package. It also tends to be a bit slower, particularly when compressing. If this doesn't matter to you, and you really want the smallest tar file, try:
tar cv path/to/data | xz -9 > file.tar.xz
Modern versions of tar, for example on Ubuntu 13.10, automatically detect compressed files. So even if you use xz compression you can still decompress as usual:
tar xvf file.tar.xz
To give a quick idea how these compression utilities compare, consider the effect of compressing patch-3.1.1 from the linux kernel:
utility cpu format size(bytes)
gzip -9 0.02s gz 105,628
advdef -2 0.07s gz 102,619
7z -mx=9 -tgzip 0.42s gz 102,297
advdef -3 0.55s gz 102,290
advdef -4 0.75s gz 101,956
xz -9 0.03s xz 91,064
xz -3e 0.15s xz 90,996
In this trivial example, we see that to get the smallest gz we need advdef (though 7z -tgzip is almost as good and a lot less buggy). We also see that switching to xz gains us much more space than trying to squeeze the most out of the old gz format, without compression taking too long.
What @Keltari said. Compression rates and ratios are highly dependent on what it is you are compressing, which is also why there are different compression algorithms and methods. – music2myear – 2014-10-21T17:13:25.267
2Compressing already compressed files may reduce their size, or it may make the archive bigger. It all depends on the type of data and any compression being used. – Keltari – 2013-01-31T19:02:05.533