0

I have a s3 bucket with more than a million files and about a thousand files added each day by various applications into various folders.

I would like to download and keep all the files locally as well, on a Linux server. What would be the best way to download all files one-time and then download new files only, let us say the ones uploaded during the previous 24 hours.

I understand that Amazon charges for listing each s3 file so I don't want to list all files everyday and then download the latest files.

I tried to do it with the following playbook and it works but I was wondering if there's a better way. It doesn't necessarily have to be using Ansible, I just used it because we use it for pretty much everything.

  - name: List s3 objects
    aws_s3:
      bucket: "testbucket"
      prefix: "test"
      mode: list
    register: s3objects

  - name: Download s3objects
    aws_s3:
      bucket: "testbucket"
      object: "{{ item }}"
      mode: get
      dest: "/tmp/{{ item|basename }}"
    with_items: "{{ s3objects.s3_keys }}"
Debianuser
  • 421
  • 4
  • 10
  • 29
  • 3
    *"I understand that Amazon charges for listing each s3 file"* Depending on the region, the cost is $0.005 per 1,000 list-objects *requests*, each of which will return 1000 files, unless you ask for fewer. Listing 1,000,000 files once per day thus costs approximately $0.15/per month, plus the bandwidth necessary to transfer the listing data if you are listing the files from outside the region where the bucket is located. This seems like a reasonable cost. – Michael - sqlbot Mar 09 '18 at 22:16

1 Answers1

1

Use the aws s3 sync command

aws s3 sync  s3://bucketname/folder/ c:\localcopy

aws s3 sync  s3://bucketname/folder/ c:\localcopy --delete

I use the --delete flag at the end of that command to delete things locally that are removed from the server. I don't know about costs for listings and such when you use sync, but read the documentation and you should work it out.

Tim
  • 30,383
  • 6
  • 47
  • 77
  • Thanks, I am a still confused about the pricing for listing the files in s3. Let us say I have 1 million files(1 MB each) in a single folder within a s3 bucket and I want to download/sync 1000 files to my local server. If I use the "aws s3 sync" command would I be still charged for listing all the 1 million files? Or will I be only charged for listing the 1000 files that I need to download? – Debianuser Mar 10 '18 at 10:10
  • `aws s3 sync` will do the same listing with the same price. – Sergey Kovalev Mar 10 '18 at 10:36