Are zip files vulnerable to corruption?

6

One problem with backup files is that they are vulnerable to corruption. One flaw in the wrong place can cause the entire contents of the file to be lost, which can be thousands of files.

Are zip file vulnerable in the same way? Or if a zip file gets corrupted, would I only lose the contained files directly affected by the corruption and be able to extract the other files correctly?

NewSites

Posted 2019-10-08T22:32:15.703

Reputation: 335

3Depends on the corruption, but yes you lose everything in most cases. – Moab – 2019-10-08T23:24:00.383

Answers

6

Are zip files vulnerable to corruption?

Yes, which is why a good backup schema verifies the newly created backup file matches the content of the source file, and also that multiple copies to different media are made, each verified.

Good backup includes verification and redundancy. That's why most backup schema recommend multiple copies, with at least one copy offsite, whether in the cloud or physically transported offsite. That resolves the small chance of bit rot.

The 7-Zip Open Source package, one of the many programs which can make and open ZIP files, includes recovery instructions, but their language regarding your chance of recovery, you will notice, is guarded.

Your chance of recovery also depends on where the corruption is; if it's in the dictionary, everything in the ZIP file is toast, which is why modern ZIP files have two copies of that dictionary.

ZIP and 7Z files should not be used to back up Linux and UNIX files, as (unlike Windows) the ownership and group data for each individual file stored within the ZIP and 7Z archive is not preserved if the ZIP file is created from Linux or UNIX. That's why Linux and UNIX backups archive first to a TAR file to preserve that data, then compress the TAR file.

K7AAY

Posted 2019-10-08T22:32:15.703

Reputation: 6 962

3Any compressed archive data is vulnerable to data corruption. Due to various mathematical reasons, you have to have all the compressed data, in order to extract the data that was compressed. There are solutions like PAR and PAR2 but the tools that create those files are no longer developed. – Ramhound – 2019-10-08T23:30:54.200

1@K7AAY, thank you. Could you expand on what you mean about ownership. I'm currently working in Windows, but am considering a switch to Linux in the future, and I don't understand the problem you're referring to. – NewSites – 2019-10-09T01:18:07.617

1Also, for long-term archiving of data files, could this vulnerability be a reason to store simple copies of the files instead of using backup software or zip files? Obviously, there would be a cost in larger storage space, but are there any other reasons for not doing this? – NewSites – 2019-10-09T01:24:33.870

The recovery instructions you have linked seem to relate to 7z files, not zip files. – plugwash – 2019-10-09T15:40:53.827

@plugwash Didn't find a ZIP-specific example that succinct; if you have one, I would welcome it so I may substitute it. – K7AAY – 2019-10-09T15:42:14.030

1I never thought about the order of tarring and then zipping before. I just did it. Your explanation of why the order matters is totally obvious once it has been pointed out. – Joe – 2019-10-16T14:04:15.383

3

In general if a compressed data stream is corrupted it is not possible for the decompressor to recover, so all data after the point of corruption is likely to be lost.

zip compressses each file individually, so the chances are that if a zip file is corrupted only one file will be affected. zips have a central directory, if this is corrupted then it may not be possible to extract the files using normal unzip tools, however it should still be possible to recover them using zip file recovery tools that search for the individual file headers (traditionally on dos this was done with a program called pkzipfix, I'm not sure if there are more modern alternatives).

Note that many other archive formats use "solid" compression (either all the time or as an option). In a solid archive the files are combined into a single data stream before compression, and therefore in such an archive format any corruption will likely destroy all files after the file that is directly affected.

plugwash

Posted 2019-10-08T22:32:15.703

Reputation: 4 587