17
2
In my application I need do compress of logs that are text files.
Seems that bzip2
and gzip
have the same compression ratio.
Is that correct?
17
2
In my application I need do compress of logs that are text files.
Seems that bzip2
and gzip
have the same compression ratio.
Is that correct?
5
Normally, bz2 has a better compression ratio, combined with better recoverability features.
OTOH, gz is faster.
xz is said to be even better than bz2, but I don't know the timing behaviour.
xz is not just slower , but much slower, 300 mb file took about 30 seconds for bzip2 to compress. I killed xz after it had been compressing for longer than 5 minutes – Tebe – 2017-01-07T01:29:47.447
@Копать_Шо_я_нашел I think it depends heavily on the compression level you choose. With -1
, it is not so very slow, but with the default settings, it tends to be quite slow. – glglgl – 2017-01-09T09:59:16.140
xz is slower than bzip2. – osgx – 2011-12-01T14:46:47.993
7
Last update of maximumcompression.com is June-2011 (answer updated in Oct-2015)
Therefore this website does not mention
the current champion text compressor worldwide:
cmix
Competitions/Benchmarks:
cmix
is not the winner because requires too much RAM, more than 20GB)Details:
Byron Knoll is actively developping cmix
as libre software (GPL) since 2013 based on the book Data Compression Explained by Matt Mahoney. Matt Mahoney also maintains some of the above benchmarks and proposes ZPAQ (WP), a command line incremental archiver.
If you prefer a more standard tool (requiring less RAM) I recommend:
lrzip
lrzip
is an evolution of rzip
by Con Kolivas.
lrzip
stands for two names: Long Range ZIP and Lzma RZIP.
lrzip
is often better than xz
(another popular compression tool).
Alexander Riccio also recommends lrzip
.
My favorite is:
zpaq
The "archiver expert", Matt Mahoney, has intensively worked on PAQ algorithms for ten years and provide the best compromise between CPU/memory resources and compression level.
However, the last zpaq
version is not often packaged/available on recent distro :-(
I always compile it from sources when I have a new machine and I need a very good compressor: https://github.com/zpaq/zpaq
clone https://github.com/zpaq/zpaq
cd zpaq
g++ -O3 -march=native -Dunix zpaq.cpp libzpaq.cpp -pthread -o zpaq
4
Maybe you could have a look to those benchmarks, especially the part testing the log files compression.
Link does not work. – Rumplin – 2018-03-29T15:29:47.263
1
i have made a benchmark to test to compress the following:
204MB folder (with 1,600 html files)
results
7zip => 2.38 MB
winrar => 49.5 MB
zip => 50.8 MB
gzip => 51.9 MB
so the 7zip is the best among them
you can get it from here
http://www.7-zip.org/
0
bz2 has tighter compression, the algorithm has more options to look for redundancy to compress away.
gzip is in much more tools, and is more cross platform. More Windows tools can deal with .gz files. It's part of http, so even web browsers can understand it.
On linux, there are tools that let you work on compressed files directly. zgrep and bzgrep can search in compressed files.
If just on Linux, I'd use bzip2, for the slightly better compression ratios.
0
xz compresses much better than bz2, but takes more time. So, if maximum compression is your goal and space on your hard drive is at a premium (which is my case with one drive at 98% full - while I reorganize my file systems), and you can fire off a script to do the work - take a break and come back in 5 minutes.
unxz is very fast to uncompress in my experience - which is a good thing for me on a daily basis.
bz2 is faster to compress than xz, but does not appear to achieve the compression results of xz.
The only way to make these assessments is to run benchmarks against a mix of common files you normally would compress/decompress, and vary the parameters to see which comes out on top.
xz (from xz-tools or 7z from p7zip, it is very like lzma) is the best. bzip2 is better than gzip. – osgx – 2011-12-01T12:36:39.677