10

I've got 2 webservers, with the chance of having to add more servers along the way. Right now I keep these servers in synch using lsyncd + csync2. It works well performance wise because all files are on both servers (no network access required to open files locally), but not so well in other cases.

One example of this is if I delete a file on server 1 and immediately uploads a new file to server 1 which has the same name. The file would then be deleted from server 2 in the meantime, causing the newly uploaded file on server 1 to be deleted as server 2 sends the delete event on to server 1 to complete the "update circle".

I can't help thinking that there must be a better way to keep servers in synch. I've been looking at GlusterFS, and I see that a setup where all files are replicated to all servers are discouraged. However, I'm running CMS systems like Drupal on these servers. Such CMS systems often opens quite a few files, and I'm worried that too much network traffic to get hold of these files will slow down the requests.

Would it be an idea to look into replacing lsyncd + csync2 with GlusterFS set up to replicate all files to all nodes, or is that a bad idea?

sbrattla
  • 1,456
  • 3
  • 26
  • 48
  • 2
    I may misundertand your problem here, but why not using network storage shared among servers ? – mveroone Aug 30 '13 at 14:36
  • 1
    @Kwaio : i'd like to "mirror" all webservers (both scripts and user uploaded content) as this will prevent downtime if something goes wrong with the server hosting the shared storage. In other words, I do not want to rely on a setup where all webservers again relies on a single storage of files. – sbrattla Aug 30 '13 at 14:48
  • Then why not a high-availability network storage cluster ? – mveroone Aug 30 '13 at 14:50
  • Sure, but what does that imply? Hardware? – sbrattla Aug 30 '13 at 14:51
  • I'm no specialist, but that means having your network storage mirrored in a master-slave architecture over 2 clusters, and a system that switches in case of a failure of the master. Most network storage device vendors have solutions for that, even if you could build it yourself with linux appliances, which would requires quite a lot of developpement. – mveroone Aug 30 '13 at 15:02
  • Alright, I was thinking more in terms of (almost) ready made solutions (such as e.g. csync2). – sbrattla Aug 30 '13 at 15:04

4 Answers4

2

BitTorrent Sync may do the deed for you. I'm using it to keep files in sync between a few internal servers at my house and it's doing the job wonderfully. The other thing you'll need to think about is the backend database when your app uses a CMS. Make sure that there's MySQL replication going on, or something of that sort.

whiskykilo
  • 76
  • 4
  • That's interesting. How does BitTorrent Sync deal with conflicts? Say I've got 5 web servers, and a file gets updated on server A. Then, 2 seconds later the same file gets updated on server C before the change on A has reached C. Does the software have algorithms for setting up which server wins in these cases? – sbrattla May 22 '14 at 20:26
  • My understanding is, BitTorrent Sync will compare the modification date/time on both copies and only keep the latest, which could be ideal or could be detrimental to the use case. – whiskykilo Aug 11 '14 at 18:40
1

GlusterFS is hard to deploy. For web data, file sync level like Unison is much easier to deploy and maintenance.

DRBD is a perfect solution for keep data sync at block level. But you have to format them to special format like OCFS2 or something similar.

truongtx
  • 40
  • 4
  • GlusterFS is not like DRBD - it does not operate at block level, but at file level. But I agreen that unison is easier to deploy. – Jure1873 Jul 07 '14 at 13:51
1

Gluster would solve the problem you have because it can hold the locks, propagate changes - delete the file on all other nodes, but it may add additional latency that can be a problem for a webserver. The next alternative is DRBD+OCFS2 or GFS, but that's probably more complex as with gluster you are using the underlying filesystem - it doesn't operate at the block level so if servers are out of sync it's not too hard to fix, files can't get corrupted so easily because of split brains, etc...

We are using it for a mailserver and it is quite slow for directories with a lot of files. You should definitely test everything before deploying. I'm currently testing the NFS mount because it works better for small files.

Jure1873
  • 3,692
  • 1
  • 21
  • 28
-3

Why don't you use a tool like puppet ? Write once in a source and once ready deploy it to the targets using "puppet kick" or mcolletive. It's well documented. And you can easily add servers later if needed.

You can also rely on tools using inotify, like lsyncd, working at the kernel level. It watches for changes in a folder and trigger a sync. But if a tool dedicated to the synchronization of files on a cluster like csync2 is not enough I don't know what will be.

Just to be sure, do the modifications happens also on server 2 or only on server 1 ?

Rosco
  • 455
  • 3
  • 6
  • 3
    Downvoted - Puppet is **not** a good way to keep files in synchronisation between servers. It **is** however a brilliant tool for managing _server configuration_. – Craig Watson Aug 30 '13 at 14:40
  • Modifications may happen at any server, as they participate in a load balanced setup. Users may upload files to any server, and that file should consequently be available on all servers as traffic is spread among the servers. – sbrattla Aug 30 '13 at 14:50
  • 1
    Well, in this case puppet is not what you want (but still, @Craig Watson, a file is a file, configuration or HTML, it is still a file). I doubt that GlusterFS is ready for production, so if you really want something safe, you'd better use an NFS server with high-availability. – Rosco Aug 30 '13 at 15:10