I have a s3 bucket with more than a million files and about a thousand files added each day by various applications into various folders.
I would like to download and keep all the files locally as well, on a Linux server. What would be the best way to download all files one-time and then download new files only, let us say the ones uploaded during the previous 24 hours.
I understand that Amazon charges for listing each s3 file so I don't want to list all files everyday and then download the latest files.
I tried to do it with the following playbook and it works but I was wondering if there's a better way. It doesn't necessarily have to be using Ansible, I just used it because we use it for pretty much everything.
- name: List s3 objects
aws_s3:
bucket: "testbucket"
prefix: "test"
mode: list
register: s3objects
- name: Download s3objects
aws_s3:
bucket: "testbucket"
object: "{{ item }}"
mode: get
dest: "/tmp/{{ item|basename }}"
with_items: "{{ s3objects.s3_keys }}"