If you extracted the files such that the modification timestamp in the archive is not preserved in the extracted copies (but rather the extracted files have their usual modification time) then the right way to attack this is via modification time. All the extracted files have a newer modification timestamp than the most recently modified existing file in that directory.
Here is a simple situation.
Suppose that none of the existing files in the current directory were touched for at least 24 hours. Anything that was modified in the last 24 hours is therefore junk from the zipfile.
$ find . -mtime -1 -print0 | xargs -0 rm
This will find some directories too, but rm
will leave them alone. They can be dealt with in a second pass:
$ find . -mtime 1 -type d -print 0 | xargs -0 rmdir
Any directories which were recently modified were modified by the zip. If rmdir
successfully removes them, that means they are empty. Empty directories that were touched by zip were probably created by it: i.e. came from the archive. We can't be 100% sure. It's possible that the unzip job put some files into an existing directory which was empty.
If find
's 24 hour granularity isn't good enough for the job, because files in the tree were modified too recently, then I'd next consider something simple: suppose that the unzip job did not put anything into existing subdirectories. That is to say, everything that was unzipped is either a file at the top level, or a new subdirectory which was not there before, which therefore contains nothing but material from the zip. Then:
# list directory in descending order of modification time
$ ls -1t > filelist # descending order of modification time
Now we open filelist
in a text editor, and determine the first entry in the list which did not come from the zip. We delete that entry and everything else after it. What remains are the files and directories which came from the zip. First we visually inspect for issues like spaces in the names, and occurrences of quotes that need to be escaped. We can then add quotes around everything, if necessary: The following assumes you use Vim:
:%s/.*/"&"/
Then join it all into a big line:
:%j
Now insert rm -rf
in front of it:
Irm - rf<ESC>
Run the line under the cursor as a shell command:
!!sh<Enter>
Definitely, I would not automate the steps of this task, due to the risk of erasing files which were already there, or screwing up due to file name issues.
If you're going to go the obvious route of obtaining a list of the paths in the zip, then capture it to a file, look over it very carefully and transform it to a removal after doing any necessary editing.
Umm thanks for the accept, but it was really @jjin's idea. I was not aware of the
lq
options forunzizp
, I just added some classic *nix tricks around his main answer. – terdon – 2013-02-14T00:54:08.753That's okay, I don't really care that much. I added my own different version of whitespace-handling anyway. – jjlin – 2013-02-14T00:57:23.573
@terdon Yeah... I upvoted jjlin's answer, too, but I can only accept one answer. – mafp – 2013-02-14T01:02:40.793
For future reference, always do one of the following with an unfamiliar archive of any format: 1) Extract it to an empty directory or 2) List it first (unzip -l) before extracting it so you can see if it's nasty like this. Archives made without a top level directory with everything under that are bad form. When done with tar, they are actually called tar bombs, so I guess this could be called a zip bomb. – Joe – 2013-02-19T09:34:59.593
@Joe It has its uses. LaTeX packages, e.g., can come in a
foo.tds.zip
form. These zips merge into an TEXMF tree, which is very convenient. But if you ever want to remove such a package you are faced with the problem I described. – mafp – 2013-02-19T09:40:54.273@mafp I'm sure it does. That's why I also mentioned 2) above - so you can see what an archive will do before it's too late and choose to accept that if it will do what you desire. Still, being able to remove it later is a big plus. Of course, you could simply restore from a backup no matter what an install or other action has done. – Joe – 2013-02-20T16:37:24.503