24

I am currently thinking about migrating some of our servers and apps to a coreOS environment. One of the problems I see here is the management of persistent data as coreOS does not handle Docker volumes when moving a container to a new machine. After some research I found glusterFS which claims to be a cluster file system that could solve all my problems.

My current idea is this: I have a glusterFS container which runs as a privileged container on each of my coreOS machines and exposes a storage, /mnt/gluster, for example. In my Dockerfiles I specify that all my volumes should be mounted on this path.

The next thing I considered was which containers should obtain their own volumes and which ones should be sharing one. For example, every mysql container would get its own volume as it is able to handle replication by itself. I don't want to mess around with that. Webservers serving the same website would properly use the same volume for stuff like "user uploaded images", etc. as they are not able to replicate those data.

Has anybody tried something like this or is there anything I have missed?

BY0B
  • 5
  • 3
Martin
  • 353
  • 1
  • 2
  • 8
  • 1
    I've done a proof of concept with this and can tell you that it works, but before you jump into Gluster make sure you understand its tuning profile. Due to how Gluster is sensitive to disk latency (similar to etcd) it can make applications artificially slower in the name of guaranteeing replication of files. – Brian Redbeard Mar 23 '15 at 20:00
  • 2
    We are working on a tool which manages the volumes attached to docker containers. It is called "flocker" and you can see the github repo here: https://github.com/clusterhq/flocker We currently have a storage backend for ZFS which uses the snapshot feature to make data migration much easier but we also have plans for other storage backends (such as a generic block device backend) I cannot say if its a good idea to mount docker volumes using GlusterFS but I can vouch for the overall design pattern - i.e. accounting for the state generated by a docker container using "something" – Bino Carlos Mar 24 '15 at 14:26
  • 1
    Thanks for your answer. I already saw flocker and it looks very promising. Do you have a rough date for coreos support or a production ready version 1.0? – Martin Mar 25 '15 at 15:38
  • I have used glusterfs for volumes with OpenStack before which has a similar setup to what you're doing and it did great. – Ethode Apr 12 '15 at 13:34
  • @Martin We(i work at ClusterHQ) have Flocker working on CoreOS utilizing Amazon EBS. https://coreos.com/blog/Flocker-on-CoreOS-Linux/ – Stephen Nguyen Oct 23 '15 at 17:28

2 Answers2

9

We have deployed a similiar setup with Atomic (http://www.projectatomic.io/) instead of CoreOS to a replicated non-distributed GlusterFS storage system with three replica-2 sets. This works very well.

However, you need to keep a few special characteristics of GlusterFS in mind. Like Brian already mentioned, Gluster places consistency and reliability above all. The more frequent changes happen, the more replication is happening. This puts a lot, and I mean A LOT, of pressure on your system.

Take care that your IO subsystem is fast (duh, it's storage), connect your Gluster nodes with the fastest network connections available. If you have only GBit, aggregate! Last but not least, the storage system must sport serious computation power, Gluster does a lot of computations to check its state. That being said, even under high load, Gluster delivers.

Reconsider your MySQL strategy. Gluster does the replication for you and also provides sort-of load-balancing in delivery. It might actually be faster to use Gluster.

bjanssen
  • 451
  • 5
  • 5
5

The use of glusterfs would depend on the storage backend that you are using. As a cluster file system it is intended to cluster physical storage so it appears as one large continuous volume. This official quick start guide has a good explanation of the process.

In the event that your setup utilizes two or more separate backend storage servers or something similar to store all of the docker volumes, then using glusterfs or some other similar parallel file system may offer significant performance advantages. If this is the case you could also consider using Lustre, which is widely used as a parallel filesystem in the HPC community.

With that being said, tuning, debugging and configuring parallel/cluster filesystems can a time consuming task which requires a lot of expertise, patience and sometimes a willingness to restart from the beginning. It would be prudent to make sure that the performance benefits a parallel file system offer are worth the amount of effort required to setup and maintain it.

Matt
  • 2,711
  • 1
  • 13
  • 20