I'm looking into setting up a shared filesystem/file server on AWS (EC2) infrastructure that offers replication and fairly painless failover. This filesystem would host potentially millions of files that are a few megs in size. Those files would be accessed (read/write) from several client VMs. If the primary file server fails I'd want the clients to be able to failover to the replica file server without losing any files (i.e. I want replication to be real-time). I've looked at a few options:
- Use S3 with s3fs. I'm concerned the latency of each request will be problematic when performing operations on thousands of files (e.g. when copying/moving files around). I've also heard some reports that make me question s3fs's stability—not sure if that's still the case.
- Setup an NFS server on an EC2 instance, using drbd to replicate blocks between two instances. Downsides:
- I've had reliability issues with drbd in the past, especially over high-latency links
- If the primary NFS server goes down it will take down the clients with it, requiring sysadmin intervention and/or reboots to get them to reconnect to the secondary server. There's no auto-failover.
Are there any better solutions?