0
UPDATE: I noticed that many people are viewing this thread, which makes me believe that this situation is not so rare after all. Anyway, I had asked a similar/related question on SO here, which has pretty decent solutions too which might solve the problem in a better way.
On my Windows 7 machine, I have a directory full of downloaded dumps in ZIP archives. Each archive contains few text files, PDFs and rarely XML files. I want to extract all the contents of each ZIP archive into its respective folder(must be created during the process) while discarding/ignoring extraction of PDFs. After extraction of required files from an archive, processed zip must not be deleted(or I would like to know how I can control it in different situations).
If it helps to know, the number of archives in the directory is in the range of 60k-70k. Also, I need separate output directories because files in an archive may have same names as files in other.
For example,
- I have all my archives like
one.zip
,two.zip
,.. in, say,D:\data
- I create a new folder for processed data, say,
D:\extracted
- Now the data from
D:\data\one.zip
should go toD:\extracted\one
. Here,D:\extracted\one
should be created automatically. - During this complete uncompression process, all the encountered PDFs should not be extracted(be ignored). There's no point in extracting and then deleting.
- (Optional) A log file should be maintained at, say,
D:\extracted
. Idea is to use this file to resume processing from where it was left in case of an error. - (Optional) Script should let me decide whether I want to keep source archives or delete them after processing.
I already did some search to find a solution but couldn't find one. I came across few questions like these
- Recursively unzip files where they reside, then delete the archives
- 7 zip extract recursively
- Is it possible to recursively list zip file contents with 7 zip without extracting
but they were not of much help(I'm not a pro with Windows by the way). I'm open to installing safe and ad free 3rd party software(open-source) like 7-zip.
EDIT: Is there a tool readily available to do what I need, I already tried Multi Unpacker. It doesn't create new directories, it can't ignore *.pdf files. It's even slow to start with, I think it first reads all the archives at source before starting to process them.
Thanks in advance!
related: http://superuser.com/q/321829/243637
– Fr0zenFyr – 2017-01-25T10:01:59.913I don't see any way around this without a batch or powershell script, as far as I know there is no out-of-the-box solution for something like this. – private_meta – 2014-06-18T06:54:02.687
@private_meta thanks for your response. I had already guessed it by now, but it's good to be sure. Can you point me in the right direction for writing a powershell for this. I also understand that ignoring PDFs during extraction is a huge challenge, so I'm ready to let the script extract everything and then delete the PDFs. – Fr0zenFyr – 2014-06-18T07:34:12.977