Tar and gzip together, but the other way round?

7

3

Gzipping a tar file as whole is drop dead easy and even implemented as option inside tar. So far, so good. However, from an archiver's point of view, it would be better to tar the gzipped single files. (The rationale behind it is, that data loss is minified, if there is a single corrupt gzipped file, than if your whole tarball is corrupted due to gzip or copy errors.)

Has anyone experience with this? Are there drawbacks? Are there more solid/tested solutions for this than

find folder -exec gzip '{}' \;
tar cf folder.tar folder

Boldewyn

Posted 2010-05-19T20:13:48.593

Reputation: 3 835

Answers

4

If you're going to do it this way, then use the tried-and-true method:

zip -r folder.zip folder

Ignacio Vazquez-Abrams

Posted 2010-05-19T20:13:48.593

Reputation: 100 516

I'm not sure, but is this not the same as .tar.gz? In other words, does zip compress the single files and add them in a simple concatenating way? My experiences with corrupted ZIP files so far were, that zip completely denies handling the archive (i.e., the same as a corrupt .tar.gz archive). – Boldewyn – 2010-05-19T20:40:07.223

@boldewyn: yes, thats how zip works. it is a container format (a bit like tar), where you can specify "storage" methods for the elements. either compress ("deflate") or just "store". – akira – 2010-05-20T06:31:31.483

11

The key disadvantage is reduce compression, especially if your archive will contain many small files.

You might be better off compressing the data the usual way (or if you have CPU cycles to spare, the slower but more space efficient 7zip) then wrapping the result in a parity based fault-tolerant format such as http://en.wikipedia.org/wiki/Parchive. This will give you much greater potential for complete recovery after data corruption due to media failure or problems in transit over the network, possibly while not compromising too much on the size of the resulting archives.

David Spillett

Posted 2010-05-19T20:13:48.593

Reputation: 22 424

Dang, beat me to it! +1 As modern compressors + forward error correction = better protected and most likely still smaller overall files than either way of using Tar + gzip. More info at http://www.par2.net

– Mokubai – 2010-05-19T20:51:01.193

This really is the proper way to do things! tar create a big container for everything, gzip removes useless redundancy in the container, par adds back some redundancy but in a uniform and carefully-designed way. (I never used par before but I know the principle.) – user39559 – 2010-09-09T10:03:45.397

Compressing each file may be a big waste. It's like you break gzip when it was just starting to have effect, and you start it over for the next file. In an extreme test case, 100 replicas of a random (maximal-entropy) file will have compression factor smaller than 1 with gzip+tar but can have compress factor close to 100 with tar+gzip. – user39559 – 2010-09-09T10:04:14.333

0

Why not just toss the --verify (or -W) flag at tar? This will verify that the contents match the source.

Jack M.

Posted 2010-05-19T20:13:48.593

Reputation: 3 133

...and doesn't work with the -z or -j flag. Also, the verification during archiving doesn't help against corruption afterwards, e.g., a not noticed bit flip while copying to the backup device. – Boldewyn – 2010-05-20T07:19:05.520

0

What do you want to backup? If permission doesn't matter (e.g. not system files), I'd go with 7zip. Provides much better performace (multi-core/cpu) with much better compression.

Apache

Posted 2010-05-19T20:13:48.593

Reputation: 14 755