19
2
A 398MB directory was only compressed to 393MB using 7Z and Normal ZIP compression. Is this normal? If so, why do people continue to use ZIP on Windows?
19
2
A 398MB directory was only compressed to 393MB using 7Z and Normal ZIP compression. Is this normal? If so, why do people continue to use ZIP on Windows?
70
If you're compressing things that are already compressed (AVI, JPEG, MP3), you won't gain much other than packing everything in a single file.
32
Compression works by looking for repetitive patterns inside the items to compress. Also because you do not want to lose any data while compressing your files, the compression must be lossless(*).
Now with that in the back in your head, think about the way files (items) are stored on a computer. At the lowest level, they are all just a bunch of 0's and 1's.
The question can thus be transformed to: "How can I represent a bunch of 1's and 0's in a more compact way than the original representation?"
So lets start from the beginning, how can you compact the normal representation of a single bit (a single 1 or a single 0)?
The answer is really easy: you can't!... a single bit is represented in the most compact manner possible.
Fair enough, let us take a bigger example, how would you compress a binary string like 0111 0111 0100 0111?
Well because we already know that looking at the individual bits won't help us at all, we know that we have to look at a bigger scale. For example, let's take 4 bits at a time.
We now see that the binary string "0111" will occur 3 times in the example, so why don't we represent that with a single bit: 0? but this still leaves 0100 in the dark, so let us represent that with "1"
We know have compressed the original to: "0010"
That's really good! However this is just the basic of basics of the "Huffman encoding algorithm", and in the real world it will be a little more complicated than that (and you would also need to store a table with the encoding information in it, but that's a bit to far for answering this question).
Now to really answer your question: why can't all data be compressed that good?, well let's take another example: "0001 0110 1000 1111", if we would use the same technique as above we would not be able to compress the data (no repetition is found), and thus would not benefit from compression...
(*) there are of course exceptions on this. The most known example of this is the compression used for MP3 files. here some information about the sounds will get lost while converting it from the raw, original file, to the MP3 format, this compression is thus lossy. Another example is the .JPG format for images
6
The process of compressing takes repeatable patterns and tokenizes them to shorter patterns. The output is then mostly non-repeatable and therefore cannot be compressed by much, if at all.
6
From the Limitations section of the Wikipedia article on Lossless Compression:
Lossless data compression algorithms cannot guarantee compression for all input data sets. In other words, for any (lossless) data compression algorithm, there will be an input data set that does not get smaller when processed by the algorithm. This is easily proven with elementary mathematics using a counting argument. ...
Basically, it's theoretically impossible to compress all possible input data losslessly.
It's harder to compress data that was already compressed. Images, videos are mostly compressed since the original size would be very large – phuclv – 2014-03-05T06:44:08.977
4
Is this normal?
No. Not with "normal" files. What kind of files were you compressing? If they were already compressed, e.g. they are JPGs, GIFs, PNGs, videos or even other zip files, then they won't be compressed much by any algorithm. If you try compressing Text, XML, uncompressed BMP, source code etc. files, zip will provide good compression, but probably not the absolute best.
Why do people continue to use ZIP on Windows?
One reason is that there is nice zip handling built into the system - you can right click anywhere and create a new zip file, then drop stuff into it. You can just double click a zip file and it opens like a folder. You can copy stuff out of it and sometimes even use it in place. You don't need to install WinZip or 7z or any other program. I usually recommend people don't.
2
In a zip archive containing many files, each file is compressed independently. If there is a great deal of similarity between the files, then a different tool might give much better compression.
For example, tar.gz joins the files together, then compresses the results. Likewise a "solid" rar file makes use of similarities between files.
The downside of tar.gz or a solid rar is that you can no longer extract a single file from a large archive without decompressing the archive up to where the file you want is.
1And I've even seen it operate in reverse causing the compressed archive to be larger than the individual compressed files. – Fiasco Labs – 2016-02-02T04:54:14.140