Highest compression for files(for web transfer)?

15

12

I have seen some highly compressed files around, such as 700MB of data compressed to around 30-50MB.

But how do you get such compressed files? I have tried using software like WinRAR and 7Zip but have never achieved such high compression.

What are the techniques/software that allow you to compress files so well?

(P.S. I'm using Windows XP)

rzlines

Posted 2009-07-17T10:28:05.840

Reputation: 7 006

igrimpe: Many compression algorithms index patterns. A billion A's is an A a billion times. You can compress that to [A]{1, 1000000000}. If you have a billion random numbers, it becomes difficult to do pattern matching since each consecutive number in a given subset decreases the probability of a matching subset exponentially. – AaronF – 2016-09-20T22:46:25.017

Nice idea ... but where do you get such files from anyways? – Robinicks – 2009-08-22T11:42:14.583

3I've seen 7zip compress server log files (mainly text) down to about 1% of their original size. – Umber Ferrule – 2009-10-20T13:07:31.137

2Open Notepad. Type 1 Billion times "A". Save, then compress. WOW! Create an app that writes 1 Billion (true) random numbers to a file. Compress that. HUH? – igrimpe – 2012-12-28T08:51:00.357

Answers

11

If time taken to compress the data is not an issue, then you can optimize compressed size by using several different tools together.

Compress the data several times using different tools like 7zip, winrar (for zip) and bjwflate.

(Note that this does not mean compress the zip file over and over, but rather create a number of alternative zip files using different tools)

Next, run deflopt on each archive to reduce each archive a little more.

Finally, run zipmix on the collection of archives. Since different zip tools are better on different files, zipmix picks the best compressed version of each file from each of the archives and produces an output which is smaller than any that any of the zip tools could have produced individually.

You should note however that this is not guaranteed to work any kind of magic on your files. Certain types of data simply do not compress very well, like JPEGs and MP3s. These files are already compressed internally.

izb

Posted 2009-07-17T10:28:05.840

Reputation: 1 134

This is incredibly poor advice, trying to use multiple compression algorithms sequentially is a very bad idea. Each algorithm creates a compressed file + overhead, so by using multiple you're actually adding data to the data you're trying to compress - it's like trying to dig a hole in the sand, the deeper you go the more sand pours in on you. You're far better off using a single good algorithm at maximum compression settings. – Tacroy – 2012-05-05T00:02:10.627

I think you misunderstand.. the same data is not being recompressed repeatedly. Rather you are simply choosing the best single algorithm on a per file basis rather than per archive. – izb – 2012-05-05T06:54:32.747

6Compress the data several times is pretty misleading. – ta.speot.is – 2013-03-11T04:24:56.747

4JPEGs and MP3s aren't zipped. They are compressed but not zipped. – KovBal – 2009-07-21T18:27:51.183

12

This depends entirely on the data being compressed.

Text compresses very well, binary formats not so well and compressed data (mp3, jpg, mpeg) not at all.

Here is a good Compression Comparison Table from wikipedia.

Nifle

Posted 2009-07-17T10:28:05.840

Reputation: 31 337

Text can easily be compressed up to 90%. – Georg Schölly – 2010-06-13T07:32:53.840

@GeorgSchölly : That's excellent. Because I can convert any data into text, e.g. convert each binary byte into two hexadecimal digits displayed into text. That would double my size, but then saving 90% of the doubled size results in an overall savings of 80%. (Or, I could use base64 for a bit more efficiency in the binary to text conversion.) This is astoundingly great news! :) – TOOGAM – 2017-09-09T16:28:27.860

I am aware that compression depends upon the type of data, but are there any specific techniques that help you compress files further? – rzlines – 2009-07-17T10:35:26.990

3Once you have compressed something it's usually impossible to get it measurably smaller. You just have to select the appropriate compression method for your data. – Nifle – 2009-07-17T10:39:40.290

9

Previous answers are wrong by an order of magnitude!

The best compression algorithm that I have personal experience with is paq8o10t (see zpaq page and PDF).

Hint: the command to compress files_or_folders would be like:

paq8o10t -5 archive files_or_folders

Archive size vs. time to compress and extract 10 GB (79,431 files) to an external USB hard drive at default and maximum settings on a Dell Latitude E6510 laptop (Core i7 M620, 2+2 hyperthreads, 2.66 GHz, 4 GB, Ubuntu Linux, Wine 1.6). Data from 10 GB Benchmark (system 4).

Source: Incremental Journaling Backup Utility and Archiver

You can find a mirror of the source code on GitHub.


A slightly better compression algorithm, and winner of the Hutter Prize, is decomp8 (see link on prize page). However, there is no compressor program that you can actually use.


For really large files lrzip can achieve compression ratios that are simply comical.

An example from README.benchmarks:


Let's take six kernel trees one version apart as a tarball, linux-2.6.31 to linux-2.6.36. These will show lots of redundant information, but hundreds of megabytes apart, which lrzip will be very good at compressing. For simplicity, only 7z will be compared since that's by far the best general purpose compressor at the moment:

These are benchmarks performed on a 2.53Ghz dual core Intel Core2 with 4GB ram using lrzip v0.5.1. Note that it was running with a 32 bit userspace so only 2GB addressing was posible. However the benchmark was run with the -U option allowing the whole file to be treated as one large compression window.

Tarball of 6 consecutive kernel trees.

Compression    Size                 Percentage      Compress    Decompress
None           2373713920           100             [n/a]       [n/a]
7z             344088002            14.5            17m26s      1m22s
lrzip          104874109            4.4             11m37s      56s
lrzip -l       223130711            9.4             05m21s      1m01s
lrzip -U       73356070             3.1             08m53s      43s
lrzip -Ul      158851141            6.7             04m31s      35s
lrzip -Uz      62614573             2.6             24m42s      25m30s

Alexander Riccio

Posted 2009-07-17T10:28:05.840

Reputation: 191

It's optimized to provide maximum compression ratio, but is enormously slower than near-contenders. – Eric J. – 2013-03-15T12:57:17.860

2@Eric J. yes, but the question didn't specify speed of compression/decompression ;) – Alexander Riccio – 2014-01-08T07:33:24.500

3

Just check the Summary of the multiple file compression benchmark tests which has the best compression list which consist the complete compression benchmark.

Top 30

enter image description here

Top performers (based on compression) in this test are PAQ8 and WinRK (PWCM). They are able to compress the 300+ Mb testset to under 62 Mb (80% reduction in size) but take a minimum of 8,5 hour to complete the test. The number one program (PAQ8P) takes almost 12 hours and number four (PAQAR) even 17 hours to complete the test. WinRK, the program with the 2nd best compression (79.7%) takes about 8,5 hours. Not surprisingly all mentioned programs make use of a PAQ(-like) engine for compression. If you have files with embedded images (e.g. Word DOC files) use PAQ8, it will recognize them and separately compress them, boosting compression significantly. All mentioned programs (except WinRK) are free of charge.

LifeH2O

Posted 2009-07-17T10:28:05.840

Reputation: 1 073

3

Squeezechart.com contains comparisons of various compression rates. Although, as stated by Nifle's answer - you're unlikely to get such high compression rates for binary formats.

idan315

Posted 2009-07-17T10:28:05.840

Reputation: 205

2

Most compression tools have settings to allow you to achieve a higher compression rate at a compromise of slower compression/decompression times and more RAM usage.

For 7-Zip, search for "Add to Archive Dialog Box" in the built-in help for more detail.

Tom Robinson

Posted 2009-07-17T10:28:05.840

Reputation: 2 350

2

You may try 7zip with the following ultra settings:

7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on big_file.mysql.7z big_file.mysql

kenorb

Posted 2009-07-17T10:28:05.840

Reputation: 16 795

1

Your best bet here seems to be trial and error. Try all your available compression techniques on each file and pick the best to put on your website. Luckily computers do this sort of thing pretty fast and don't get bored. You could write a simple script to automate the process so it would be "relatively painless".

Just don't expect miracles - 700 mb down to 30 mb just doesn't happen that often. Log files as mentioned above - yes. "Your average file" - no way.

hotei

Posted 2009-07-17T10:28:05.840

Reputation: 3 645

1

Nanozip seems to have highest compression together with FreeArc. But it is not in final version yet. There is how good compression Nanozip achieves. It has very high compression and it does not takes too much time, check the Summary of the multiple file compression benchmark tests, but FreeArc is faster.

user712092

Posted 2009-07-17T10:28:05.840

Reputation: 687

PAQ8 compresses to a higher compression rate than Nanozip. Still +1 cause Nanozip has way better time spent / compression ratio. – Gaspa79 – 2020-02-14T14:44:24.100