As stated, I'm trying to download this dataset of zip folders containing images: https://data.broadinstitute.org/bbbc/BBBC006/ and store them in an s3 bucket so I can later unzip them in the bucket, reorganize them, and pull them in smaller chunks into a vm for some computation. Problem is, I don't know how to get the data from https://data.broadinstitute.org/bbbc/BBBC006/BBBC006_v1_images_z_00.zip for example or any of the other ones, to then send it s3 this is my first time using aws or really any cloud platform so please bear with me :]
Asked
Active
Viewed 1,872 times
1 Answers
0
You can't manipulate data in S3, you can only upload to S3, download from S3 and delete from S3, but you can't for example "unzip" in S3. If you want to upload the content of your unzipped archives you will have to unzip them locally on your laptop and then upload to S3.
If you are on Windows you can use a S3 client like S3 Browser to use your S3 storage.
Hope that helps :)
MLu
- 23,798
- 5
- 54
- 81
-
Is there really no way to do this without first storing and unzipping locally? – Yufa Jul 17 '19 at 00:54
-
S3 is an object store, it has no capability to download files from a URL to be stored in S3. You need to download it to your PC then upload it. You might be able to map S3 as a drive using S3FS so you can store it directly in S3 which removes one intermediate step, but I'm not sure how reliable S3FS is. – Tim Jul 17 '19 at 01:32
-
You have to unzip it *somewhere* before uploading to S3. Maybe you can create an EC2 instance in AWS, download the ZIPs there, and upload the content to S3 from there. That will probably be faster. – MLu Jul 17 '19 at 01:33
-
1If you use an ec2 spot instance the cost will be about $0.01 for an instance to do a download then upload. – Tim Jul 17 '19 at 03:02