-1

We have a secure environment where we sometimes transfer large quantities of unclassified data between machines certified for classified operations. We zip these files to ease the process of data transfer. A concern has been raised about these zip files: specifically, do zip files use a form of block compression that could potentially grab portions of blocks on the hard drive (outside of the files of interest) that might contain classified information?

To illustrate further, let's say that Zip file A contains files B, C and D. Let's assume for purposes of this discussion that the logical block size of the NTFS file system we're working with is 16K bytes, and that the zip algorithm uses a similar block size of 16K.

Given those conditions, files B, C and D will almost certainly have some slack space at the end of them (their size not being exact multiples of 16K). Is there any possibility of that slack space containing information other than what is contained in the files of interest, and if so, is there any possibility of that information making its way into a zip file (not catalogued by the Zip file system itself)?

Robert Harvey
  • 188
  • 10
  • Most sane compression programs will open the file for read access using standard system calls. In ordinary circumstances this won’t retrieve any additional data outside the file contents itself. – David May 21 '18 at 18:09
  • 1
    If this data is actually _classified_ not just _confidential_, then you probably shouldn't be asking for a solution from an online Q&A site and instead should be hiring an expert. – forest May 22 '18 at 04:59

1 Answers1

1

No, ZIP utilities (and all standard archival utilities) operate entirely on the filesystem layer. That is, they use regular filesystem calls to access data to compress and archive it. The utility not only does not access the block device itself, it can't even if it wanted to without very high permissions. The way these utilities work is by issuing filesystem syscalls that read directory entries. When entries are found that match the specified path, a syscall is issued that reads the contents of the file into a buffer. The data is then compressed in that buffer and written, using another filesystem syscall, to the target file. This keeps repeating until there are no more files to be archived. It is up to the kernel's filesystem driver to put only the data in the requested file itself into the memory buffer.

You mentioned that the data you are working with is classified. If it is truly classified, rather than merely confidential, you should absolutely not be taking advice from a Q&A site. You should have a qualified expert with appropriate clearance to do this for you, or you will be taking a big risk. For example, while the ZIP format itself does not take anything from the block device, it is not impossible for simple memory corruption to result in classified data in memory overwriting memory that contains unclassified data which is to be put in the archive. Likewise, a single flipped bit can mean the difference between accessing the inode for your grocery list and accessing an inode pointing to a file that requires explicit security clearance to view. You should absolutely never keep classified data next to unclassified data on the same machine unless it is certified to do so.

The PKZip on-disk format: https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html

forest
  • 64,616
  • 20
  • 206
  • 257