11
1
I frequently have the need to compress files that are very similar to each other.
Currently I use 7Zip, which compresses a 16GB file down to 1.2GB in about 35 minutes using 8 cores with Ultra settings.
It seems to me that much of that time is spent computing the dictionary to use for compression. Since the files are highly similar, the dictionary actually used is likely also similar.
Is there a Windows-based compression tool (7Zip with an option I'm not aware of, or a different tool) that can save the dictionary and reuse that saved dictionary for subsequent files?
Is there a better way to approach the problem of maintaining a compression ratio similar to what I have, while compressing significantly faster?
1With Huffman and a predefined dictionary, would just one pass be required? Are there any off-the-shelf Huffman based tools that support saved dictionaries? – Eric J. – 2013-03-15T13:11:15.430
@EricJ. yes, with a predefined dictionary it would be single-pass encoding. I don't know any software off-hand that can do this, although I have personally written programs that do so. While I haven't tried it, this tool looks like it can do just that. However, just note that (again, unlike LZW) to decode a Huffman-encoded bitstream, you still need the original dictionary to decompress the data.
– Breakthrough – 2013-03-15T13:12:44.413Based on the age of that tool, I'm guessing it is single-threaded. I would guess using 1 core rather than 8 would offset any benefit to a fixed dictionary :-( Having the dictionary available on the other end is feasible in my scenario (transferring large files between data centers). – Eric J. – 2013-03-15T13:28:33.173