We have a growing set of data files (.wav files, image files etc) which are data, i.e not part of the application code - uploaded and modified by users. The number of files is in the 1000s and the total size reaches GBs.
We have several server clusters in different locations around the world (US, EU, ME). In each cluster it is important that the data is served locally and not from S3 (the data files are not served directly to clients, but are processed by the servers). We want to designate a file server in each location which will serve the files via NFS to the other nodes in the same cluster.
So the bottom line is:
- Files uploaded via the application should end up on
S3
. - Each file server node should replicate those files.
We see several options:
- Using an
origin
file server that replicates toS3
for backup/versioning and to the nodes viarsync
(or similar). - Same as above but slaves replicate from
S3
using something like S3 tool or similar. - Not using an
origin
- app code uploads directly toS3
, and slaves replicate as above.
We were wondering which is the recommended solution, and what tools are available for the replication part (i.e in the filesystem-to-filesystem category, and in the filesystem/S3 category).