1
OSX 10.11 - python3.5 or AWS CLI (or other tool?)
I have ~ 5,000 subdirectories within an Amazon S3 bucket, each subdirectory contains a single .tar. In each .tar it contains only one .zip, ~<1mb in size.
What I would like to do is run a script that will access each subdirectory within the S3 bucket and copy this .zip found within each .tar to either a given s3 location, or to a local destination.
Each .tar is ~10-15GB when uncompressed, so extracting the full contents is not feasible/wanted. I do believe that the .tar header can instead be read, in order to locate the .zip and copy.
Can you tell me of a way I can achieve this
Tar files don't include file positions in the file header -- they are streams, and have to be scanned. In fact, the same file can appear more than once within a given tar file so technically they have to be scanned all the way to the end so that you get the last file of that path+name, which is usually what you want. The answer below will get the file you want, but the entire tar will still be read, even if it isn't extracted. – Michael - sqlbot – 2016-01-15T02:20:33.667