I have setup Elastic load balancing with 5 EC2 instance registered with the load balancer. To our website users upload their data(images), we store these images in network attached storage (NAS). We have the NAS mounted on all the instances.

We are planning a move to introduce Amazon AutoScaling and also move out of Network Attached storage.

  1. Is GlusterFS a good solution to share data across all the instances in the Autoscaling group?

  2. Does Gluster ensure there is no loss of data ?

  3. What will happen if all the instances in Autoscaling are terminated, will I lose user data ?

  4. What happens if a user uploads a image and the server processing the request goes down ?

  5. Is there an impact on IO if clients go down ? (What exactly does Gluster do?)

Possibly.. The only way you'll get a definitive answer is with your own tests, however. In the past, I've set up a 4 node webserver cluster on Linode instances, using GlusterFS to distribute/share the assets directory of images and so on.
We found 2 main problems with this approach:

  1. GlusterFS is pretty IO intensive, and works really well on hardware with uncontended IO
  2. Occasionally, a Linode server would experience less-than-optimal access to the backend SAN, and IO-Wait time would go up dramatically. When this happened, Gluster would copy more data between the remaining nodes, which caused IO performance to suffer on those nodes in turn. The result of this was that a minor IO blip, caused by suboptimal SAN configuration, or timesharing would mean that the entire webserver cluster would go poot, and the entire shared filesystem might become unavailable.

Purely anecdotal evidence, but I'd not run GlusterFS on a virtual machine with SAN/shared storage ever again.

It can... In Gluster 3.0, there's a better recognition of "replication pools" where you can define how many copies of the data exists throughout the cluster. Setting a replication level of 2, means that there's 2 copies on the entire cluster.. This effectively halves your storage capacity, but means that you've got greater resilience to node failure.
Importantly, it also means that you have to add more nodes as multiples of the replication level, in this case, pairs of nodes.

If the instances are only using ephemeral instance storage, yes. If they're EBS based, or using mounted EBS instances, then no.

That greatly depends on how your application is designed. I strongly suspect that the user would lose their data (almost certain in a naively architected solution.)

See above.. If the client goes down because of backend storage problems, it can easily destroy the performance of the cluster entirely.

  • even if my instances are EBS based... Autoscale will terminate the instances if utilization reduces and because of that IO will be impacted right? Does gluster store a copy of the data on the gluster server or does it replicate only between the clients ? – Santhosh S Nov 07 '11 at 03:13
  • When I mentioned IO problems, I more meant because the servers powering your instances may have contended IO. This was certainly the case with Linode, but I expect AWS to be slightly better, (ha!) – Tom O'Connor Nov 07 '11 at 16:38
  • I seem to recall GlusterFS can do either/both replication on server and on client level. – Tom O'Connor Nov 07 '11 at 16:39
  • @TomO'Connor You forgotten to mention yet another Glusterfs issue, Split Brain, which is caused by Network Partition. Split Brain takes place quite often on Replicated Volume on EC2 and it couldn't be healed automatically. Moreover, if a file is Split Brain, it's locked and clients are not able to overwrite it. – Roman Newaza Dec 12 '12 at 06:06

GlusterFS seems to require a bit too much configuration when bringing online new instances to make it a good system to use on instances that need to autoscale. I am sure it can be done but its easier to change the architecture so that the web instances are different than the glusterfs instances. The web instances then only need to connect as a client to the glusterfs layer. The web instances can then be setup to autoscale.

A good rule when dealing with cloud systems is to have a 1:1 mapping of service to instance. Don't try to make an instance do too much. Architecturally this helps when trying to scale things.

    Pfft. As far as configuration of new nodes is concerned, that's where configuration management such as Puppet or Chef has it's heyday. You bring up a new node, provision it through AWS management API, then run Puppet on it to bring it into the cluster, set up gluster partitions, mountpoints, etc. – Tom O'Connor Nov 07 '11 at 00:54
  • Also, a 1:1 mapping of service to instance, yes, does make scalability a bit easier, especially if your services are entirely stateless by nature. It however has the side effect of increasing the total cost of the system, but hey, that's just the way the cookie breaks up into smaller constituent parts. – Tom O'Connor Nov 07 '11 at 00:55

You've already got some good answers to your Gluster questions however I'd like to mention something that may be of use.

Depending on your use case you may find the following easier to manage & less error prone:

  • EC2's are all identical, with code being pulled from a repo to keep it upto date (you can manage this a number of ways through deployment processes)
  • Any user uploads go straight to S3 via s3fs or API calls integrated into your app (python/php etc)

The benefits of S3 are neat:

  • Only pay for what you use (no need to pay for a whole bunch of unused resources in EC2s, running costs, replication through multiple machines etc, also zero management required)
  • Redundancy is built into S3 so your files are safe the moment they make their way into s3 (safe meaning they're in a managed service, in multiple locations around the world. AWS reports they haven't ever lost a file in s3)

If you wanted to go the extra mile you could configure your (linux) server to send all logs to a "logging server" (this keeps all the EC2s as identical, as as dumbed down as you can get).

I found this sort of setup has worked quite well in the past for the web servers I've managed.

