How does 'dictionary size' affect compression?

41

10

I know that higher size may lead to better compression ratio and vice verca. But is there a way I can decide better?.. since there are so many choices 7zip


So far I've noticed dictionary size ≈ file size yields optimum compression. file size
Here the ∼8mb file test.avi has same compression ratio for all dictionary sizes greater than 8mb. Then it starts to fall.

laggingreflex

Posted 2013-07-08T03:44:28.240

Reputation: 3 498

2Yes, that is since the whole file is in memory. However, this may not be possible if dealing with multi-gig files. The return on investment diminishes the higher you go. If you need that last 1% then size=file size. Note: When you have a much larger data set a 128mb+ dictionary size will increase the time it takes to compress files significantly. – cybernard – 2013-07-08T04:40:58.803

Answers

30

Repeatable items are stored in a dictionary and a code is assigned as a substitute.

THIS IS AN OVER SIMPLIFICATION

aaaaaaaaaaaaaaaaaaaaaaaa  0001
bbbbbbbbbbbbbbbbbbbbbbbb  0002
alsdjl;asjdfkl;asdfjkljj  0003

instead of the whole line it just put the code in its place. The larger the dictionary the more codes it can handle. Normally, when a dictionary becomes full it starts a new one on the fly. When it starts a new one it is blank and new codes are assigned to detected patterns.

Generally, the larger the better to a point. The entire dictionary is held in memory so you need more RAM than the dictionary size.

The dictionary size depends on the compressibility of your data, the number of files, size, and overall size.

Generally, 32mb is more than enough, but if your compressing numerous multi-gig files then a much higher number can be used. Larger dictionaries often make the process slower, but the results in a smaller file.

cybernard

Posted 2013-07-08T03:44:28.240

Reputation: 11 200

2Is the size that you set a limit for the dictionary size, or the actual size it will be? Do programs (7-zip in particular) normally determine intelligently whether they really need to fill the whole dictionary that you've allowed? – Stan – 2016-02-24T09:27:45.500

1Yes, it is a limit. When full they either start a new dictionary, or intelligently push out old data. Unless the data to compress is greater that the size of the dictionary it will get filled. – cybernard – 2016-02-24T12:55:49.820

@cybernard "it will get filled"? To be clear, does the dictionary size remain less than the limit when it is not filled? – LonnieBest – 2019-09-06T03:09:12.723

1@LonnieBest Yes, the dictionary starts out completely empty. Every so many bits/bytes makes a new dictionary entry until it gets full. – cybernard – 2019-09-09T17:02:23.547