7-Zip seems unable to compress zero-filled virtual disk image after 1.8 GB

33

1

I am trying to compress a virtual disk image that is mostly empty. It has about 2 GB used out of 13.5, but even after zero filling the empty space, I am still only able to compress it to around 11.9 GB.

I have been using 7-Zip and tested LZMA, LZMA2, and PPMd at a variety of dictionary sizes and word sizes.

Every time, no matter the settings, after 1.8 GB have been processed, the compression seems to stop. The compression ratio rises from 3% where it was from about the start up to the final 85%.

I know that it should be able to compress further as a previous person compressed an older 17 GB image to 1.5 GB, but they are no longer around for me to ask. The only difference is that this new image has seven partitions instead of the two in the old image.

Is there something I am missing that makes compression of virtual disk images better?

Tom

Posted 2019-10-24T21:40:26.657

Reputation: 811

18Scroll through your 13.5 GB file in a hex editor and see how much of it is really made up of contiguous zeros – Nayuki – 2019-10-25T19:58:03.347

Answers

74

Ok wow, I found the issue. In the process of adding security measures (like separating /tmp /var and others into their own partitions) I also added encryption to each partition. Naturally, seemingly random data after encryption will not compress well at all.

Tom

Posted 2019-10-24T21:40:26.657

Reputation: 811

5you should use the encryption feature of the hypervisor instead. Their compact feature can remove zero clusters and encrypt the remaining blocks – phuclv – 2019-10-25T07:14:21.110

30Note that this is a function of proper encryption. Badly made encryption can show zero-filled space, and you don't want to use those. – Nelson – 2019-10-25T08:48:45.303

1@Nelson it seems that his drive started zero-filled. When encryption was turned on, apparently the zeroes were not encrypted but left unchanged. So when anyone reads the data, the encryption software thinks the zeroes on the disk are actually the encrypted data, and "decrypts" them, giving random looking data. Interesting effect. – gnasher729 – 2019-10-27T22:25:17.690