How to copy a .zip from within a .tar to a given location without extracting entire .tar using AWS S3 CLI

Answers

to pull out a single file called zipfile.zip from archive tarfile.tar:

tar xvf /path/to/tarfile.tar /path/to/where/you/want/zipfile.zip

You could use perl to recurse

#!/usr/bin/perl
my @directories_to_search = ('/root/path/to/s3/dir/');
use File::Find;
use File::Basename;

finddepth(\&extract_zip, @directories_to_search);

sub extract_zip {
    return unless /tar$/; # ignore all but tar files
    my $tarname = $File::Find::name;
    `tar xvf "$tarname" /desired/path/name-of-zip-inside-archive.zip`;
}

Something very close to the above should work. (tested in El capitan). Problem you might have is if the zip filename is different in each tar archive. If it is, you will need to get hold of the name of the zip inside the tar before you extract (or if there is a pattern match eg *.zip you could try that instead)

Pingers

Posted 2016-01-14T22:58:34.963

Reputation: 337

accepted because it is a framework to start with. However, when using AWS s3 as the location of the .tar, streaming is involved which I believe requires an EC2 instance to coordinate, in order to read the information contained within the .tar. – bjmarra – 2016-01-21T19:51:10.217

Tar files don't include file positions in the file header -- they are streams, and have to be scanned. In fact, the same file can appear more than once within a given tar file so technically they have to be scanned all the way to the end so that you get the last file of that path+name, which is usually what you want. The answer below will get the file you want, but the entire tar will still be read, even if it isn't extracted. – Michael - sqlbot – 2016-01-15T02:20:33.667