Fastest GZIP utility

18

0

I'm looking for the fastest gzip (or zip) utility. I have a LVM volume which 95% exists out of blank 0's, so compressing that is very easy. I'm looking for the most fastest solution, and don't really care of the compression except the 0's.

I'm aware of gzip -1 (same as gzip --fast) but was wondering if there's any faster method.

Thanks.

Edit: after some tests, I compared gzip -1, lzop -1 and pigz -1 with eachother and came to the following results:

PIGZ:

time dd if=/dev/VPS/snap | pigz -1 | ssh backup-server "dd of=/home/backupvps/snap.pigz"

104857600+0 records in
104857600+0 records out
53687091200 bytes (54 GB) copied, 2086.87 seconds, 25.7 MB/s
7093985+266013 records in
7163950+1 records out
3667942715 bytes (3.7 GB) copied, 2085.75 seconds, 1.8 MB/s

real    34m47.147s

LZOP:

time dd if=/dev/VPS/snap | lzop -1 | ssh backup-server "dd of=/home/backupvps/snap.lzop"

104857600+0 records in
104857600+0 records out
53687091200 bytes (54 GB) copied, 1829.31 seconds, 29.3 MB/s
7914243+311979 records in
7937728+1 records out
4064117245 bytes (4.1 GB) copied, 1828.08 seconds, 2.2 MB/s

real    30m29.430s

GZIP:

time dd if=/dev/VPS/snap | gzip -1 | ssh backup-server "dd of=/home/backupvps/snap_gzip.img.gz

104857600+0 records in
104857600+0 records out
53687091200 bytes (54 GB) copied, 1843.61 seconds, 29.1 MB/s
7176193+42 records in
7176214+1 records out
3674221747 bytes (3.7 GB) copied, 1842.09 seconds, 2.0 MB/s

real    30m43.846s

Edit 2:

This is somewhat unrelated to my initial question, however using time dd if=/dev/VPS/snap | lzop -1 | ssh backup-server "dd of=/home/backupvps/snap.lzop" (block size changed to 16M) the time is reduced to real 18m22.442s!

Devator

Posted 2012-03-14T13:45:51.280

Reputation: 962

1Be careful: it's somewhat unfair to use time in such a manner. The throughput of the dd used for pigz is lower than the other two. – Henk – 2012-03-14T16:15:43.247

@Devator: by looking at the timings one might conclude that right now pushing bytes through the encrypted ssh tunnel is the bottleneck. did you try to use ssh with the "-c" (compression) flag and let the pre-compressor out of the equation? you could also switch to a faster encryption algorithm. aside from that: re-benchmark without the ssh-tunnel (eg, using /dev/null as the output sink) – akira – 2012-03-14T17:40:23.197

As a sidenote, could you use a sparse file? Then the zeroes would take up no space on disk. Your compression would also be faster because the zeroes would be interpolated by the filesystem driver (and wouldn't have to be read from disk.)

– Li-aung Yip – 2012-03-15T10:57:56.100

@Li-aungYip I don't think so, as the "files" are LVM volumes. – Devator – 2012-03-15T11:03:22.047

Ah, I see. Carry on! – Li-aung Yip – 2012-03-15T11:07:28.867

Also relevant: gzip cannot max out your hard drive in many situations because of unchangeable buffer sizes chosen for the last millenium. lzop can.

– nh2 – 2013-09-27T19:09:49.160

Answers

14

If you don't mind stepping away from DEFLATE, lzop is an implementation of LZO that favors speed over compression ratio.

Ignacio Vazquez-Abrams

Posted 2012-03-14T13:45:51.280

Reputation: 100 516

1

or .. snappy: http://code.google.com/p/snappy/

– akira – 2012-03-14T17:35:09.037

Thanks, I've found lzop to be the fastest in my scenario. It's faster than pigz somehow (probably due to the lot's of 0's). – Devator – 2012-03-15T10:27:16.413

23

Although I personally have not yet used it, I think using parallel gzip could speed up things a bit:

pigz, which stands for parallel implementation of gzip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data.

Pascal

Posted 2012-03-14T13:45:51.280

Reputation: 331

Looks like the project is abandoned at this point. – AlexLordThorsen – 2017-11-28T21:31:55.327

I prefer to think of it as "stable". It doesn't update often, but it does update. – Alan De Smet – 2018-02-06T19:38:20.083

1I use it routinely, and absolutely recommend pigz if multiple cores are available. Other than changing the compression level, this is by far the most accessible and straightforward means of speeding up compression. – jgrundstad – 2012-03-14T14:55:23.850

3The site looks a bit odd. But don't be fooled by that, pigz is written by the one of the developers of gzip and zlib, Mark Adler. – so_mv – 2013-08-12T05:58:16.943

7

You can try Parallel Gzip (Pascal linked it in), or Parallel BZIP.
In theory, BZIP is much better for text, so you may want to try pbzip.

Apache

Posted 2012-03-14T13:45:51.280

Reputation: 14 755

2

Your disk is limited at 30MB/s

All compressors do well enough. You can even reduce network transfer using slighty slower but omnipresent bzip2.

$dd if=/dev/zero bs=2M count=512 | pigz -1 | dd > /dev/null
512+0 records in
512+0 records out
1073741824 bytes (1.1 GB) copied, 9.12679 s, 118 MB/s
8192+7909 records in
9488+1 records out
4857870 bytes (4.9 MB) copied, 9.13024 s, 532 kB/s
$dd if=/dev/zero bs=2M count=512 | bzip2 -1 | dd > /dev/null
512+0 records in
512+0 records out
1073741824 bytes (1.1 GB) copied, 37.4471 s, 28.7 MB/s
12+1 records in
12+1 records out
6533 bytes (6.5 kB) copied, 37.4981 s, 0.2 kB/s
$dd if=/dev/zero bs=2M count=512 | gzip -1 | dd > /dev/null
512+0 records in
512+0 records out
1073741824 bytes (1.1 GB) copied, 14.305 s, 75.1 MB/s
9147+1 records in
9147+1 records out
4683762 bytes (4.7 MB) copied, 14.3048 s, 327 kB/s

Have you considered rsync? It does checksumming and then gzipping the difference only.

ZaB

Posted 2012-03-14T13:45:51.280

Reputation: 2 365

1My disk is not limited at 30 MB/s. I've just ran your test: pigz -1: 1073741824 bytes (1.1 GB) copied, 8.6779 seconds, 124 MB/s and gzip -1: 1073741824 bytes (1.1 GB) copied, 11.6724 seconds, 92.0 MB/s . I've thought about rsync but that would check the file differenced and it would probably not help, as most of the time a lot has changed. – Devator – 2012-03-14T22:03:01.933

If you are at transferring zeroes look how impressive does bzip2 encoding look in comparison. Just at which side you measure speed.... 4Mbit/s of pigz might be too much for a common DSL line... It grows even worse if your disk is that fast. – ZaB – 2012-03-18T18:51:01.437

2

Re: lzop it is slower at its std config... Tweaking can half the time. But there is an even faster replacement called blosc:

https://github.com/FrancescAlted/blosc

Hmm... The time it took to post this and get replies is probably at least double any time savings you'll get though... Now excuse me while I recompile my kernel to shave off another .1s from my 2s boot time.

technosaurus

Posted 2012-03-14T13:45:51.280

Reputation: 996