I have a collection of files, currently in a NAS on our LAN which is about 1.1 million files, with a total size of 2TB. We need to replicate this up to AWS to begin processing it. However, changes need to also sync back to our LAN when made on the cloud-side.
So far, the lowest sync latency we've been able to get is about an hour or two. Mounting the local NAS on our EC2 instance and simply enumerating all files find [path] &> /dev/null
takes over an hour.
However, the files are in a structure of directories by order numbers, and once the order is complete, they are rarely, if ever modified. Likewise, the directories contain the order numbers, so that could potentially be used to find the most recent ones. I feel that this fact could be used to our advantage, but I'm not sure how.
Bandwidth is not an issue (around 100 MBPS both ways), and latency from the office to our AWS region of choice is about 35 ms.
Is there a better way to handle this? We have the ability to run VMs locally on our LAN if need be.