Pros and cons of bzip vs gzip?

124

30

I've known gzip for years, recently I saw bzip being used at work. Are they basically equivalent, or are there significant pros and cons to one of them over the other?

ripper234

Posted 2010-10-30T17:01:03.143

Reputation: 9 293

2

While this is an old question with a valid and correct answer, I would like to point people to this google result: http://tukaani.org/lzma/benchmarks.html as it does break it down further

– Angry 84 – 2016-01-07T09:12:36.713

Isn't bzip for compression and gzip for archival? – juniorRubyist – 2016-12-29T20:09:32.590

@juniorRubyist source? – ripper234 – 2016-12-30T16:55:42.437

I just heard that. I forgot where. – juniorRubyist – 2016-12-30T17:41:48.073

No mention of random access? https://stackoverflow.com/questions/14225751/random-access-to-gzipped-files

– neverMind9 – 2019-02-02T03:40:53.440

Answers

149

Gzip and bzip2 are functionally equivalent. (There once was a bzip, but it seems to have completely vanished off the face of the world.) Other common compression formats are zip, rar and 7z; these three do both compression and archiving (packing multiple files into one). Here are some typical ratings in terms of speed, availability and typical compression ratio (note that these ratings are somewhat subjective, don't take them as gospel):

decompression speed (fast > slow): gzip, zip > 7z > rar > bzip2
compression speed (fast > slow): gzip, zip > bzip2 > 7z > rar
compression ratio (better > worse): 7z > rar, bzip2 > gzip > zip
availability (unix): gzip > bzip2 > zip > 7z > rar
availability (windows): zip > rar > 7z > gzip, bzip2

As you can see, there isn't a clear winner. If you want to rely on programs that are likely to be installed already, use zip on Windows (or if possible, self-extracting archives, as Windows doesn't ship with any of these) and gzip on unix. If you want maximum compression, use 7z.

Rar also has downside that, as far as I know, there is no free software that creates rar archives or that can unpack all rar archives. The other formats have free implementations and no (serious) patent claims.

Gilles 'SO- stop being evil'

Posted 2010-10-30T17:01:03.143

Reputation: 58 319

@Gilles, And What about pbzip? – shgnInc – 2015-01-20T05:40:45.857

@shgnInc Less commonly available than bzip2. As for speed, it depends how many processors you have. Hmm, I should add xz. – Gilles 'SO- stop being evil' – 2015-01-20T08:31:50.357

[citation needed] – mlainz – 2016-01-23T06:24:58.717

11@mlainz Original research. This isn't Wikipedia. – Gilles 'SO- stop being evil' – 2016-01-23T10:09:25.340

unrar is the open source rar unpacking utility. – stommestack – 2016-09-13T16:29:02.393

@JopV. Last I looked, there were some options of the rar format that the open-source unrar didn't support. I don't remember what options these are but I have had rar archives in my hand that only worked with the closed-source version. – Gilles 'SO- stop being evil' – 2016-09-13T17:37:02.487

2as far as I can tell, all versions of Windows since XP, can open zip file natively using the file explorern – Lie Ryan – 2010-11-02T15:00:25.393

3it seems to have completely vanished - Plain old bzip vanished because it was using the patented algorithmic coding. Because of the patent, it was re-designed to use Huffman coding instead. During this re-design, new features and improvements were added. The fundamental thing that makes it a unique compression algorithm though, the Burrows–Wheeler transform, stayed the same in both versions. – forest – 2019-01-01T03:23:26.587

This is a major difference between gzip and bzip2 for those working with data processing tools like Apache Spark: bzip2 is splittable and gzip is not. This means that Spark can read a single bzip2 file using multiple concurrent tasks, whereas a gzipped file can only be read with a single task.

– Nick Chammas – 2019-09-16T19:06:29.123

1bzip2 is less available than gzip? What UNIX systems don't come with bzip2? – new123456 – 2011-07-03T14:19:52.473

22@new123456 On OpenBSD, gzip is in the base system but bzip2 has to be installed from a package. Many *WRT routers include gzip but not bzip2. – Gilles 'SO- stop being evil' – 2011-07-03T17:53:00.163

2@Gilles I can confirm that my DD-WRT Release: 08/12/10 (SVN revision: 14929) does not have bzip2, but does have gzip. – Urda – 2012-03-31T16:10:28.057

24

As far as I can tell, gzip is overall faster, while bzip overall produces better (smaller) compression.

Lie Ryan

Posted 2010-10-30T17:01:03.143

Reputation: 4 101

Do you have any statistics or sources to back that up? – IQAndreas – 2016-02-09T12:20:51.190

1

@IQAndreas: some benchmarks: 1, 2, 3

– Lie Ryan – 2016-02-09T12:45:58.373

Also, gzip seems to be slightly better supported, especially on Windows.. – Dentrasi – 2010-10-30T17:32:53.893

5@Dentrasi: winrar/7zip support both, what's the problem? – whitequark – 2010-10-31T04:26:05.860

Although bzip2 is often better, gzip usually pulls ahead for text compression. – forest – 2019-01-01T03:25:20.913

@whitequark: being widely supported is mostly important for unix since users may not have root access and must work with what is already installed. Also applies to Windows environments where the user does not have admin access (schools/libraries/etc). – Matthew – 2012-11-26T19:23:14.057

4@Matthew, you don't need admin rights to use a lot of ported free software, including 7zip. – whitequark – 2012-11-28T00:26:28.710

5

The algorithms have different time, memory, space tradeoffs. Bear in mind these algorithms were written quite a while back and your smartphone has many times more CPU than desktops of those days.

Your pick is between universality (.gz) and a bit more compression (.bz2). Only you can say whichyou care about more.

One advantage of .gz is that it can compress a stream, a sequence where you can't look behind. This makes it the official compressor of http streams. I needed to use gzip once because of that, but unlikely you'll need to think about it.

Rich Homolka

Posted 2010-10-30T17:01:03.143

Reputation: 27 121

4

Here is a list of sites that test compression algorithms, to find just bzip and gzip you will have to do some digging, but most sites will list characteristics of the algorithms. This way you can compare what is important to you, size (compression ratio), time, memory, cpu.
http://www.maximumcompression.com/benchmarks/benchmarks.php

Scott McClenning

Posted 2010-10-30T17:01:03.143

Reputation: 3 519

1

Per http://tukaani.org/lzma/benchmarks.html , gzip compresses twice as fast as bzip2, and decompresses ten times as fast.

Eg for use with s3 caching, on travis etc, where you want speed of compress/decompress, not just small sizes, gzip might be a good trade-off.

Hugh Perkins

Posted 2010-10-30T17:01:03.143

Reputation: 531

1

In my experience bzip has offered consistently better compression ratios than gzip. Plus with 7zip as manager and bzip algorithm, 7zip can make use of multi core processors.

Sathyajith Bhat

Posted 2010-10-30T17:01:03.143

Reputation: 58 436