As @kinokijuf said, there is a file header. But to expand upon that there are a few other things to understand about file compression.
The zip header contains all the necessary info for identifying the file type (the magic number), zip version and finally a listing of all the files included in the archive.
Your file probably wasn't compressed anyways. If you run unzip -l example.zip
you will probably see that the file size is unchanged. 19 bytes would probably generate more overhead than would be saved if it were compressible at all by DEFLATE (the main compression method used by zip).
In other cases, PNG images for example, they are already compressed so zip will just store them. DEFLATE won't bother compressing anything already compressed.
If on the other hand you had a lot of text files, and their size was more than a few kilobytes each, you would get great savings by putting them all into a single zip archive.
You will get your best savings when compressing very regular, formatted data, like a text file containing a SQL dump. For example, I once had a dump of a small SQL database at around 13MB. I ran zip -9 dump.sql dump.zip
on it and ended up with around a 1MB afterwards.
Another factor is your compression level. Many archivers by default will only compress at mid-level, going for speed over reduction. When compressing with zip, try the -9
flag for maximum compression (I think the 3.x manual says that compression levels are only supported by DEFLATE at this time).
TL;DR
The overhead for the archive exceeded any gains you may have gotten for compressing the file. Try putting larger text files in there and see what you get. Use the -v
flag when zipping to see your savings as you go.
1And that is a plain text 1GB log file? – CyberSkull – 2012-08-29T17:50:06.867
@CyberSkull - Yes it is. – PeanutsMonkey – 2012-08-29T19:21:02.120
Can you please tell us what your zip parameters were? I would have done something like
zip -9T "example.zip" sample.log
(-t
is just to test the integrity of the archive.). – CyberSkull – 2012-08-29T19:31:12.087@CyberSkull - I only ran the standard command i.e.
zip sample.zip sample.log
however when I ran 7zip I defined the maximum compression i.e.7zr a -mx=9 sample.7z sample.log
– PeanutsMonkey – 2012-08-29T19:44:42.0506Random data from /dev/urandom does not generate a true text file; it will not compress well at all. Text bytes are limited in range, with many spaces and repeating patterns (e.g. "th" and "sp") and words. You have in fact generated a random binary file. – Ken – 2012-08-29T19:49:30.607
@Ken - I had no idea that it would create a random binary file. How would you create a random true text file? – PeanutsMonkey – 2012-08-29T20:05:49.113
One option is to just
– CyberSkull – 2012-08-29T20:21:30.880cat
all your logs into a single file. Another is to download a collection of text files (like from Gutenberg) and try compressing them or joining them into a single large file to experiment on.@CyberSkull - So there is no other way to create a true text file using commands such as dd? – PeanutsMonkey – 2012-08-30T01:18:59.077
1Open your favorite text editor. Now get your cat or small child and induce them to play with the keyboard for 5 minutes or so. You now have a large random text file! ;) – CyberSkull – 2012-08-30T05:58:18.963
1@CyberSkull: No, you have a random stream of ASCII characters. Which is a bit more compressible than random binary, but still nowhere near as structured as text. – Ben Voigt – 2013-05-28T19:24:46.277