How to "unextract" a zip file?

52

20

I extracted a zip file into a non-empty folder. The zip file has lots of files and a deep hierarchy, that merged with the existing tree of the target directory. How can I remove the files and directories that where created by unzipping without destroying the files and directories that were already there? Of course, I still have the zip file that I merged in, so the information is there.

mafp

Posted 2013-02-13T23:24:08.983

Reputation: 513

Umm thanks for the accept, but it was really @jjin's idea. I was not aware of the lq options for unzizp, I just added some classic *nix tricks around his main answer. – terdon – 2013-02-14T00:54:08.753

That's okay, I don't really care that much. I added my own different version of whitespace-handling anyway. – jjlin – 2013-02-14T00:57:23.573

@terdon Yeah... I upvoted jjlin's answer, too, but I can only accept one answer. – mafp – 2013-02-14T01:02:40.793

For future reference, always do one of the following with an unfamiliar archive of any format: 1) Extract it to an empty directory or 2) List it first (unzip -l) before extracting it so you can see if it's nasty like this. Archives made without a top level directory with everything under that are bad form. When done with tar, they are actually called tar bombs, so I guess this could be called a zip bomb. – Joe – 2013-02-19T09:34:59.593

@Joe It has its uses. LaTeX packages, e.g., can come in a foo.tds.zip form. These zips merge into an TEXMF tree, which is very convenient. But if you ever want to remove such a package you are faced with the problem I described. – mafp – 2013-02-19T09:40:54.273

@mafp I'm sure it does. That's why I also mentioned 2) above - so you can see what an archive will do before it's too late and choose to accept that if it will do what you desire. Still, being able to remove it later is a big plus. Of course, you could simply restore from a backup no matter what an install or other action has done. – Joe – 2013-02-20T16:37:24.503

Answers

28

jjlin's answer is the way to go. I just want to add a few choices for directories:

  • Delete all extracted files, no directories:

    unzip -lqq file.zip | gawk -F"  " '{print $NF;}' |
      while IFS= read -r n; do rm "$n"; done
    
  • Delete extracted files and empty directories only

    unzip -lqq file.zip | gawk -F"  " '{print $NF;}' |
      while IFS= read -r n; do rm "$n"; done; rmdir *
    

    With no options, rmdir deletes only empty directories, it will leave files and non-empty folders alone so you can safely run it on *.

  • Delete everything extracted, but prompt for a confirmation before each deletion:

    unzip -lqq file.zip | gawk -F"  " '{print $NF;}' |
      while IFS= read -r n; do rm -ri "$n"; done; rmdir *
    

    The -i flag will cause rm to prompt before every removal, you can choose Yes or No.

  • Delete everything extracted, directories included:

    unzip -lqq file.zip | gawk -F"  " '{print $NF;}' |
      while IFS= read -r n; do rm -rf "$n"; done
    

terdon

Posted 2013-02-13T23:24:08.983

Reputation: 45 216

Deleting empty directories is easily done with find: find * -depth -type d -exec rmdir {} + and ignore all the Directory not empty messages. It might be legal to shorten this to find * -type d -delete as the -delete option switches on -depth but I haven't verified that -delete won't delete a non-empty directory. – Adrian Pronk – 2013-02-14T08:43:03.357

@AdrianPronk it doesn't: find: cannot delete './foo': Directory not empty – terdon – 2013-05-30T14:36:32.653

28

You can use unzip -lqq <filename.zip> to list the contents of the zip file; this will include some extraneous info that you'll need to filter out, though. Here's a command that works for me:

unzip -lqq file.zip | awk '{print $4;}' | xargs rm -rf

The awk command extracts just the names of the files and directories. Then the result gets passed to xargs to delete everything. I suggest doing a dry-run of the command (i.e., by omitting the xargs rm -rf part) first to make sure the results are correct.

The above command will have issues dealing with paths that have whitespace. This (more complicated) version should fix that:

unzip -lqq file.zip | awk '{$1=$2=$3=""; sub(/ */, "", $0); printf "%s%s", $0, "\0"}' | xargs -0 rm -rf

jjlin

Posted 2013-02-13T23:24:08.983

Reputation: 12 964

This is already quite close to what I had in mind, but unzip -lqq lists also the directories contained in the zip. For now, I would let all directories alone. How to delete all empty directories in a tree might be a follow-up question. – mafp – 2013-02-14T00:24:17.440

@mafp That's a good point about the directories. You can add grep -v '/$' into the pipeline to skip deleting the directories (which all have a trailing slash, AFAICT). – jjlin – 2013-02-14T00:40:19.727

@terdon Actually I think the problem begins at the awk, since printing just $4 won't print the full path. – jjlin – 2013-02-14T00:44:28.450

I don't think you should be using the -r option of rm: that seems to be asking for trouble, especially when combined with the -f option. I wouldn't use the -f option at all in this scenario. – Adrian Pronk – 2013-02-14T08:34:58.417

@AdrianPronk Those options are needed if you want to avoid error messages. However, if you use the directory-skipping variant (with grep -v '/$'), then I think you can omit both. – jjlin – 2013-02-14T16:07:02.590

1@jjlin: grep -v '/$' will only omit directory entries in the ZIP file. They will still include entries that were plain files in the ZIP file but were pre-existing directories in the target folder. For this reason, it would be wise to omit -r – Adrian Pronk – 2013-02-14T19:54:35.037

11

With the switch -Z1, unzip will list exactly one file per line (and nothing else).

This way, you can use

unzip -Z1 | xargs -I {} rm '{}'

to delete all files extracted from the zip file.

The command

unzip -Z1 | xargs -I {} rm -rf '{}'

will delete directories as well, but you have to be careful. If the directories already existed before extracting the zip file, all pre-existing files in those directories will be deleted as well.


If you're going to re-extract the zip file anyway, there's another approach that is guaranteed to deal with strange file names.

First extract the zip file where you originally meant to extract it:

unzip file.zip -d elsewhere

Now, change into the directory where you extracted the files by mistake and execute the following command:

find elsewhere -type f -printf "%P\0" | xargs -0 -I {} rm '{}'
  • -type f only finds files (no directories).

  • %P\0 is the relative path (without elsewhere/), followed by a null character.

  • -0 makes xargs separate lines by null characters. This is more reliable, since – in theory – file names can contain newline characters.


To deal with leftover directories, you can execute the command:

find -type d -exec rmdir -p {} \; 2> /dev/null
  • -type d only finds directories.

  • -exec rmdir -p {} \; executes rmdir -p {} for every directory that has been found.

    {} is the directory that has been found, and the -p switch makes rmdir remove its empty parent directories as well.

  • 2> /dev/null suppresses the error messages that will arise from trying to delete non-empty or previously deleted directories.


Related man pages:

Dennis

Posted 2013-02-13T23:24:08.983

Reputation: 42 934

+1 for making me read zipinfo's man page. – terdon – 2013-02-14T01:04:50.337

Well, gee, that makes it a little easier. :) – jjlin – 2013-02-14T01:11:35.417

2

Here is an even easier and safer (I think) solution

zip -m getmeoutofhere.zip `unzip -lqq myoriginalzipfile.zip`
rm getmeoutofhere.zip

What this is doing: The backquoted unzip command will produce a list of what was in your original file.

zip -m will then use that list to add add that each to getmeoutofhere.zip and remove it from the original directory (so theoretically it should be indential to myoriginalfile.zip.

The downside is that unzip -lqq will produce some extra text, dates, times, filesize, etc. These will cause zip -m to produce error messages but this should have no affect (unless you have the unlikely case of a file with the same name).

Please note that this will not remove any directories that were created during the original unzip.

David E.

Posted 2013-02-13T23:24:08.983

Reputation: 51

Interesting approach, will explore further. – mafp – 2013-02-19T22:24:27.723

1

If you extracted the files such that the modification timestamp in the archive is not preserved in the extracted copies (but rather the extracted files have their usual modification time) then the right way to attack this is via modification time. All the extracted files have a newer modification timestamp than the most recently modified existing file in that directory.

Here is a simple situation.

Suppose that none of the existing files in the current directory were touched for at least 24 hours. Anything that was modified in the last 24 hours is therefore junk from the zipfile.

$ find . -mtime -1 -print0 | xargs -0 rm

This will find some directories too, but rm will leave them alone. They can be dealt with in a second pass:

$ find . -mtime 1 -type d -print 0 | xargs -0 rmdir

Any directories which were recently modified were modified by the zip. If rmdir successfully removes them, that means they are empty. Empty directories that were touched by zip were probably created by it: i.e. came from the archive. We can't be 100% sure. It's possible that the unzip job put some files into an existing directory which was empty.

If find's 24 hour granularity isn't good enough for the job, because files in the tree were modified too recently, then I'd next consider something simple: suppose that the unzip job did not put anything into existing subdirectories. That is to say, everything that was unzipped is either a file at the top level, or a new subdirectory which was not there before, which therefore contains nothing but material from the zip. Then:

# list directory in descending order of modification time
$ ls -1t > filelist  # descending order of modification time

Now we open filelist in a text editor, and determine the first entry in the list which did not come from the zip. We delete that entry and everything else after it. What remains are the files and directories which came from the zip. First we visually inspect for issues like spaces in the names, and occurrences of quotes that need to be escaped. We can then add quotes around everything, if necessary: The following assumes you use Vim:

:%s/.*/"&"/

Then join it all into a big line:

:%j

Now insert rm -rf in front of it:

Irm - rf<ESC>

Run the line under the cursor as a shell command:

!!sh<Enter>

Definitely, I would not automate the steps of this task, due to the risk of erasing files which were already there, or screwing up due to file name issues.

If you're going to go the obvious route of obtaining a list of the paths in the zip, then capture it to a file, look over it very carefully and transform it to a removal after doing any necessary editing.

Kaz

Posted 2013-02-13T23:24:08.983

Reputation: 2 277