Between xz, gzip, and bzip2, which compression algorithim is the most efficient?

16

3

Between xz, gzip, and bzip2, which compression algorithm gives the smallest file size and fastest speed when compressing fairly large tarballs?

Nathan2055

Posted 2013-04-10T18:42:37.877

Reputation: 792

'the best' as in 'resulting in the smallest filesize' ? – Hennes – 2013-04-10T18:53:11.620

I don't know, I was trying to find someway to word the question so I could add my test as an answer. I also have no idea why this thing was closed. @Karan – Nathan2055 – 2013-04-10T22:45:20.193

1Oh, why it was closed it easy. "Best" is highly subjective and usually leads to discussions or non-constructive answers. Best compression can be smallest file size, fastest compression, least power used to compress (e.g. on a laptop), least influence on the system while compressing (e.g. ancient single treaded programs using only one of the cores), ... or a combination of all of those. – Hennes – 2013-04-11T05:10:35.817

An interesting article to read is http://www.tomshardware.com/reviews/winrar-winzip-7-zip-magicrar,3436.html (windows based, and focussing on 7zip, magicRAR, WinRAR and WinZip rather than xz, gz or bz, but still interesting and providing background information).

– Hennes – 2013-04-11T05:13:01.967

@Hennes - I cleaned up the post to replace best with exactly what I was researching. Also, thanks for the article you mentioned, I will read it later today. – Nathan2055 – 2013-04-11T18:45:53.350

Answers

15

In my stress test, I compressed 464 megabytes of data using the three formats listed. Gzip returned a 364 MB file. Bzip2 returned a 315 MB file. Xz returned a 254 MB file. I also did a simple speed test:

Compression:

1: Gzip

2: Xz

3: Bzip2 (my fan was blowing quite a bit while this was going, indicating that my Athlon II was fairly strained)

Decompression:

1: Xz

2: Gzip

3: Bzip2

Please note that all of these tests were done with the latest version of 7-Zip.

Xz is the best format for well-rounded compression, while Gzip is very good for speed. Bzip2 is decent for its compression ratio, although xz should probably be used in its place.

Nathan2055

Posted 2013-04-10T18:42:37.877

Reputation: 792

Note that different data types will result in different compressed sizes. See here for examples.

– Ploni – 2018-06-27T17:13:30.563

2Good research. Have you tried the various compression level options offered by (at least) bzip2, e.g. bzip2 -9 <file>? – Aaron Miller – 2013-04-10T20:54:39.533

@AaronMiller - No, is it possible to use those via 7-Zip? – Nathan2055 – 2013-04-15T21:35:04.003

It appears so, though I'm not sure to what extent: see http://www.dotnetperls.com/7-zip-examples , section "Switch m".

– Aaron Miller – 2013-04-16T13:35:03.723

7Out of curiosity, what sort of data was the test file? – GeminiDomino – 2013-12-31T23:35:22.710

4

I did my own benchmark on 1.1GB Linux installation vmdk image:

rar    =260MB   comp= 85s   decomp= 5s
7z(p7z)=269MB   comp= 98s   decomp=15s
tar.xz =288MB   comp=400s   decomp=30s
tar.bz2=382MB   comp= 91s   decomp=70s
tar.gz =421MB   comp=181s   decomp= 5s

all compression levels on max, CPU Intel I7 3740QM, Memory 32GB 1600, source and destination on RAM disk

I Generally use rar or 7z for archiving normal files like documents.
and for archiving system files I use .tar.gz or .tar.xz by file-roller or tar with -z or -J options along with --preserve to compress natively with tar and preserve permissions (also alternatively .tar.7z or .tar.rar can be used)

update: as tar only preserve normal permissions and not ACLs anyway, also plain .7z plus backup and restoring permissions and ACLs manually via getfacl and sefacl can be used which seems to be best option for both file archiving or system files backup because it will full preserve permissions and ACLs, has checksum, integrity test and encryption capability, only downside is that p7zip is not available everywhere

Sudoer

Posted 2013-04-10T18:42:37.877

Reputation: 141

Student, what was options of rar? Why not try lrzip by kolivas, it should work good for virtual disk images.

– osgx – 2015-03-14T06:39:42.430

I'm migrating from RAR to Git and tarballs for my text files and btrfs for everything else; my reason for using RAR is not performance, i'm using it because of features such as recovery record, separate file-level 256bit checksum for every file and ... . – Sudoer – 2015-03-14T13:21:30.633

3

I think that this article provides very interesting results.

http://pokecraft.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

The most size efficient formats are xz and lzma, both with the -e parameter passed.

The fastest algorithm are by far lzop and lz4 which can produce a compression level not very far from gzip in 1.3 seconds while gzip took 8.1 second. The compression ratio is 2.8 for lz4 and 3.7 for gzip.

Here are a few results I extracted from this article :

  • Gzip : 8.1s @ 3.7

  • lz4 : 1.3s @ 2.8

  • xz : 32.2s @ 5.43

  • xz -e : 6m40 @ 7.063

  • xz : 4m51s @ 7.063

So if you really desperatly need speed, lz4 is awesome and still provides a 2.8 compression ratio.

If you desperatly need to spare the byte, xz at the maximum compression level (9) does the best job for text files like the kernel source. However, it is very long and takes a lot of memory.

An good one where needed to minimize the impact on time AND space is gzip. This is the one i would use to make manual daily backups of a production environment.

Johnride

Posted 2013-04-10T18:42:37.877

Reputation: 156