Three formats? Why?

12

1

I needed to download the Ruby Source recently from here and it says, "available in three formats" which are .tar.bz2, .tar.gz and .zip. Is there any reason that we need all three formats? At least on Linux and OSX I can do any of the three easily. On Windows, only zip is built-in, I think. Is there anything behind these preferences or is this just a religious battle?

Dan Rosenstark

Posted 2010-03-14T12:37:15.070

Reputation: 5 718

Just look at it as being nice to as many types of users as possible. Packaging source (or any other files) in multiple formats saves some users unnecessary manual steps. And BTW, in Windows, if you have 7zip, WinZip or WinRar all 3 formats are supported. – Traveling Tech Guy – 2010-03-14T18:33:04.473

Yes. acceptable, ace, admirable, agreeable, bad, boss, bully, capital, choice, commendable, congenial, crack, deluxe, excellent, exceptional, favorable, first-class, first-rate, gnarly, gratifying, great, honorable, marvelous, neat, nice, pleasing, positive, precious, prime, rad, recherché, reputable, satisfactory, satisfying, select, shipshape, sound, spanking, splendid, sterling, stupendous, super, super-eminent, super-excellent, superb, superior, tip-top, up to snuff, valuable, welcome, wonderful, worthy answer. But what about keeping the Internet DRY? – Dan Rosenstark – 2010-03-14T19:23:28.543

Whoa. Too much caffeine. Or a thesaurus binge :) And why DRY??? Moisturize often, or else your internet will crack. – Traveling Tech Guy – 2010-03-15T04:57:05.657

@Traveling Tech Guy, yes, lots of caffeine :) DRY as in http://en.wikipedia.org/wiki/Don%27t_repeat_yourself

– Dan Rosenstark – 2010-03-15T10:58:22.763

Answers

19

.tar.gz files are (still, after some years in that position) the most common archive format for archives intended for unix-a-like systems. Users on any unix-a-like system will be able to open these without installing additional software, but users running Windows can not. They are sometimes called .tgz instead though this is less common now (the convention was started to get around Windows file naming limitations that were removed in Windows NT and Windows 95).

.zip files are accessible by default on modern Windows variants without any extra software being installed. They are generally usable on any other systems too, but support is not always included by default in minimal installations.

So the above two formats are given to achieve near 100% coverage of what people will be able to open even from a freshly installed system with not extra tools added.

.tar.bz2 files are similar to .tar.gz but use the bzip2 format instead of gzip. These will be smaller, sometimes considerably smaller, so quicker to download - but support on Windows is less common and like .zip support is not always present by default on minimal installs of other OSs.

This is offered as a convenience to those users that have the extra utility installed (and possibly to save a little bandwidth for the provider), though for small files the difference is not worth the hassle of creating/offering/supporting (in install/build documentation for instance) the extra format.

.7z files are starting to be seen more these days too. These are files produced+read by 7-zip and compatible tools, and are generally much smaller than ziped or gziped archives (and often smaller than bzip2ed archives too). For instance, I regularly compress MSSQL database backups for transfer up an ADSL based internet connection - 7zip tends to produce files less than half the size of those produced in the standard zip format which makes a significant difference in transfer time (more than making up for the fact that the 7zip compression algorithm is much slower then the standard zip algorithm). The use of 7-zip format is not particularly common at the moment as the relevant tools are less commonly installed than the other options.

As with bzip2 archives, 7-zip archives are, where available, offered as a convenience to those users that have the extra utility installed (and to save a little bandwidth for the provider), though for small files the difference is not worth the hassle of creating/offering/supporting (in install/build documentation for instance) the extra format.

(if you want to see a religious battle on the subject of archive file formats, take a short trip into what is left of Usenet or pirate (sorry "scene") territory and dare to suggest that something might be more suitable than .rar archives - it is almost as incendiary as suggesting an emacs user try vim or vice-versa)

David Spillett

Posted 2010-03-14T12:37:15.070

Reputation: 22 424

good answer, David. I guess I just find it to be a very non-DRY solution to repeat three file formats for every single thing we download on the Net. – Dan Rosenstark – 2010-03-14T12:59:19.797

I actually read somewhere about a guy who uses vim sometimes and emacs sometimes. I was shocked! – Dan Rosenstark – 2010-03-14T13:25:30.777

4No, .rar files are brilliant! They should be used for everything, including splitting TV shows up into 30 9mb files, and compressing albums! Is there anything this glorious format CAN'T do? (Or so I've heard. I buy all of my TV shows, obviously) – Phoshi – 2010-03-14T13:55:57.980

2i would note though that if you run windows, 7zip supports all of the 4 formats mentioned ;p – Journeyman Geek – 2010-03-14T14:32:02.833

Good explanation. As for VI and Emacs.... yeah I do that. I use Emacs for a lot of stuff while programming, but started with VI, so I have the habit of jumping into VI to fix compiler errors. – spowers – 2010-03-14T15:00:59.330

@journeyman: yes, if the user has 7zip installed they can read all the above formats. It can read (but nor create) rar archives too. But this question is more about what users commonly have installed (which dictates the formats sites make their stuff available in) rather then what they could potentially have installed. – David Spillett – 2010-03-14T15:58:08.463

@Phoshi: Many other archive formats have native support for multi-volume archives. Zip and 7-zip to name but two. Any other format can still be split and rejoined more manually too. IIRC one thing that the rar format includes that not all others do is checksum data so you know you have received, so you know you have received an uncorrupted file without the need for extra external checksums (like a sha1 hash included with the file). – David Spillett – 2010-03-14T16:01:46.753

@David; Sorry, I was being a little sarcastic - people seeing .rar as some sort of magical space-shrinker is annoying :( Splitting a tv show up into 30 files doesn't reduce the size, it just increases annoyance! – Phoshi – 2010-03-14T16:50:59.990

@spowers, I was kind of sarcastic too. I should've put in the sarcastic tags :) – Dan Rosenstark – 2010-03-14T19:50:57.327