Splitting into many .ZIP files using 7-Zip

12

4

If I have a 100 GB folder and I split ZIP it, is there a difference in how much disk space is consumed if I split it into 100 .ZIP files at 1 GB each or 10 .ZIP files at 10 GB each?

Do 100 .ZIP files at 1 GB each take up more space than 10 .ZIP files at 10 GB each?

Kong

Posted 2018-10-28T13:24:05.957

Reputation: 423

And you can't find out because? – Dave – 2018-10-28T17:41:48.260

5Why can't you just try it? – Peter Mortensen – 2018-10-28T19:02:08.290

1Each standalone ZIP file has some overhead. However, you can chop up a ZIP file into pieces that can be reassembled. Those pieces don't have the ZIP overhead in each one, and if you split at sector or block boundaries, they don't contain wasted space. – fixer1234 – 2018-11-01T02:54:44.333

Answers

18

Let's find out!

100 MB files (27 pieces):

7z a -tzip -v100M ./100m/archive ./kali-linux-xfce-2018.2-amd64.iso

$ du ./100m/
2677884 ./100m/

10 MB files (262 pieces):

7z a -tzip -v10M ./10m/archive ./kali-linux-xfce-2018.2-amd64.iso

$ du ./10m/
2677908 ./10m

Results: The 10 MB split archive takes up an extra 24 KB. So yes, there is a difference, the 100 1 GB files will take up more space than the 10 10 GB files.

The difference seems to be negligible though. I would go for whichever is more convenient for you.

Layne Bernardo

Posted 2018-10-28T13:24:05.957

Reputation: 781

4du doesn't output the size in bytes by default (unless your 270M of files turned into 2,677,908 bytes). It does display the on-disk size of files, which may be different than the actual data size (maybe applicable for uploading or storing on other filesystems) – Xen2050 – 2018-10-28T15:42:44.720

You are correct, it's actually outputting in KB. I've edited the answer to correct this discrepancy. The original file is a Kali Linux ISO, it is ~2.6GB. You have a good point about the on-disk size vs actual data size, I was specifically thinking about on-disk size because it accounts for the overhead of having additional files but you're right that it would be different depending on what you're actually doing with the archives. – Layne Bernardo – 2018-10-28T15:50:32.043

Sorry, I crossed with your largely similar answer while I was double-checking the run strings. – AFH – 2018-10-28T15:50:56.503

Zip file max size is 4GB. – pbies – 2018-10-28T17:38:28.053

Re "The difference seems to be negligible": What is it in %? – Peter Mortensen – 2018-10-28T19:03:26.690

@PeterMortensen See the other answer. The only difference is how much extra space the file system needs to store another file. The file itself is not any larger. – Alexander O'Mara – 2018-10-28T20:52:48.190

1Yeah, that's why I didn't bother calculating a percent. I don't think it works out to a flat percentage of original file size, especially considering differences in file systems. – Layne Bernardo – 2018-10-29T10:47:45.090

@PeterMortensen It's not a proportional loss. It's a fixed overhead per file. So, the resulting "percentage" overall will depend on the number of files in each scenario, and is trivial to calculate. – Lightness Races with Monica – 2018-10-29T10:52:08.500

15

Every file has a file system overhead of unused logical sector space after the end-of-file, but this is eliminated if the split size is a multiple of the logical sector size (not necessarily true of my example below).

There may be extra bytes used by the extra directory entries, but these will not figure unless the directory now occupies an extra logical sector.

The split files are identical in content to those created by a binary splitter program with the same split size.

I verified these on Linux by using the GUI version on a 7+MB file, giving 8 split files of 1MB size with 7-Zip (File.7z.00?), then created a single, full archive (Full.7z), which I split with:-

7z -v1000000 a File;                                         # Create split volumes File.7z.00?
7z a Full File;                                              # Create full archive Full.7z
split -b 1000000 -a 3 --numeric-suffixes=1 Full.7z Full.7z.; # Split full archive into Full.7z.00?
for f in {001..008}; do cmp Full.7z.$f File.7z.$f; done;     # Compare splits with 7z volumes

To test on another OS you may need to down-load or write an appropriate splitter program.

AFH

Posted 2018-10-28T13:24:05.957

Reputation: 15 470