How to combine / merge zip files?

21

5

For the last several months I have copied several data folders to zip files at weekly intervals. Now I'd like to combine those zip files into one zip file, because most of the contents of the existing zip files are just different versions of the same data files.

So if a file appears in more than one of the existing zip files, I'd like the newest version to be in the new zip file being created. Of course if a file appears in only one existing zip file, then I want it in the final zip file also.

I'm trying to avoid having to unzip them one by one to a working folder, overwriting data from older zip files with data from newer zip files, and then rezipping everything into a new zip file.

From what I understand pkzip would combine the zip files themselves, but is there a dependable and fast free method anyone can tell me about?

CChriss

Posted 2010-01-08T03:40:26.927

Reputation: 1 193

1zipmerge for the win – Codebling – 2015-02-18T00:19:20.687

Answers

7

you won't like it but: unzipping everything into a working folder in the right order, then zipping the result is the most effective way.

otherwise, you will end up with a lot of wasted CPU cycles:

  • assume your result goes to 'first.zip'
  • every file from '2.zip', '3.zip' etc has to be unzipped and then zipped again into 'first.zip'
  • in '2.zip' exists a file 'foobar.txt' and in '3.zip' exists another file 'foobar.txt'. merging it the way you want to merge it leads to 'compress it X times'
  • the toc of a .zip is at the end of the file: you add more content (to the middle of the
    .zip by updating a file in the middle) and the whole file has to be rewritten

so, imho just use 'unzip' wiseley:

% mkdir all
% for x in *.zip ; do unzip -d all -o -u $x ; done
% zip -r all.zip all

the order of the unzipping is important, I don't know the pattern of your zip names, but I would extract the newest zip file first, the '-u' option of unzip overwrites only files if they are newer or creates files if not already there. as a result, you will unzip only the newest files and zip the result only once.

akira

Posted 2010-01-08T03:40:26.927

Reputation: 52 754

This might be most effective from a user perspective - but it's not accurate. The file table in a zip is in fact at the end, but you can arbitrarily write files to the end of any zip and then write a new file table. The last record always wins. Those with a little familiarity with scripting or programming could do this entire process without uncompressing or compressing any files at all just by moving the binary chunks around and updating the zip table. – caesay – 2018-01-02T21:34:02.207

In fact, you could just concatenate all the zip's into a single file (in the order you want) and then write a new file record at the end to only include the latest versions of files. This has the added benefit that the zip still contains all the previous versions of files which can be recovered if nessesary – caesay – 2018-01-03T10:25:28.540

"every file from '2.zip', '3.zip' etc has to be unzipped and then zipped again into 'first.zip'" is not correct. The zipmerge utility merges ZIP archives without decompressing and re-compressing, for example. – ZachB – 2018-02-02T00:40:55.603

I used unzipping / zipping and not uncompress / decompress. Yes, obviously one can take one entry of 2.zip (the compressed blob) and transfer it into first.zip and thus no "compression" has to take place. But you have to extract the blob from 2.zip, lookup the existance in the TOC of first.zip, if its there either replace the existing entry (which means you have to rewrite the whole file basically) or append it at the end - and after that you need to append the toc of the zip. I dont see how zipmerge can achieve the merging of zip-entries in a different way (compression aside) – akira – 2018-02-07T11:14:42.470

-1 because there are far more efficient ways to do this task, and none of the justifications for this being "the most effective way" makes the slightest bit of sense. every file [...] has to be unzipped and then zipped again - no, that's what your solution does. in '2.zip' exists a file 'foobar.txt' and in '3.zip' exists another file 'foobar.txt'. merging it the way you want to merge it leads to 'compress it X times' - no it doesn't. Why would it? you add more content [...] and the whole file has to be rewritten - no, you write the output in one pass. Why did anyone upvote this? – benrg – 2019-10-09T02:56:35.600

@benrg: i don't see your answer somewhere here … – akira – 2019-10-20T21:14:12.770

4

Just use the -g option of ZIP, where you can append any number of ZIP files into one (without extracting the old ones). This will save you significant time.

Also have a look at zipmerge

Christos

Posted 2010-01-08T03:40:26.927

Reputation: 49

4-g adds files to an existing zip. it does not merge them. eg: zip -g result.zip other.zip will add the file other.zip into result.zip.--grow Grow (append to) the specified zip archive, instead of creating a new one. If this operation fails, zip attempts to restore the archive to its original state. If the restoration fails, the archive might become corrupted. This option is ignored when there's no existing archive or when at least one archive member must be updated or deleted.` – akira – 2014-03-02T08:40:51.413

4

It may not be what you're looking for, but the free Ant build tool does include the ability to merge Zipfiles.

https://ant.apache.org/manual/Tasks/zip.html

CarlF

Posted 2010-01-08T03:40:26.927

Reputation: 8 576

2

https://linux.die.net/man/1/zipmerge:

zipmerge merges the source zip archives source-zip into the target zip archive target-zip. By default, files in the source zip archives overwrite existing files of the same name in the target zip archive.

imz -- Ivan Zakharyaschev

Posted 2010-01-08T03:40:26.927

Reputation: 443

1

I was thinking you could script the files being extracted into a temp directory.

There is problem with this command line. I couldn't find a way to order the unzipping of archives, so an older archive may overwrite a newer archive. This problem may be overcome by using an unzipper the has a command line switch to only overwrite if newer. I mainly use 7-Zip which doesn't have such a command line option.

Also, this command needs on all the zip files being in the same directory. Not a problem if all the zips have unique names. That said, the command can be changed to fit your situation.

for /f %f in ('dir /b *.zip') do "c:\program files\7-zip\7z" x %f -oc:\testdir -r -aoa

To change this to use another unzipping program just replace "c:\program files\7-zip\7z" x %f -oc:\testdir -r -aoa with whatever command you would execute on each file. Use %f as a place holder for the name of the file you want to unzip.

I tried looking for a polished app, free or otherwise and didn't really find one.

Hopefully this will give you a good start and WinZip or something similar can take care of the overwrite problem.

Good luck.

Scott McClenning

Posted 2010-01-08T03:40:26.927

Reputation: 3 519

0

If I remember correctly, pkzip was a command-line program.

There's still a command-line version of ZIP which claims to be compatible with pkzip.

It's called Info-ZIP and there should be a version for your OS.

pavium

Posted 2010-01-08T03:40:26.927

Reputation: 5 956

Does it have the functionality I'm asking about? I can't find where it lists this ability. – CChriss – 2010-01-08T06:18:37.027

1The Info-Zip suite makes files compatible with PKZip, but the programs themselves are different and don't seem to include a merge option. – CarlF – 2010-01-08T08:27:58.827

OK, sorry, I was able to compile and run Info-Zip on an Apollo workstation under DOMAIN/OS many years ago. I recall it provided different features on DOS/VMS/Unix and a few others, even then. I suppose it may have evolved further. – pavium – 2010-01-08T10:12:29.847

0

Look for winzip command line on the net. Winzip has several versions of command line tools to fit whatever version of winzip you may may have installed. The command line tool WZZIP has a -f "freshen" option that will zip newer files only of those that match the name of a file in the summation output zip file.

Use WZunzip wraped in a FOR statement as shown above to unzip one file to a directory then WZzip -f to add those files to an output summation zip file. Then the FOR loop repeats to work on the next input file to output to the one and only summation output file. The order of the input files does not matter since WZzip -f will only add to the output file if the input data is newer than what is already in the output file. All files that do not exist in the output file will also be added. Then you may unzip the result to a folder and then zip it up again to obtain an efficiently packed result file. You can even do this automatically after the FOR loop at the end of the batch file.

eewiz

Posted 2010-01-08T03:40:26.927

Reputation: 1