8

I was wondering if it's possible to mirror two servers, like you could upload files to one server and they'd push to the other server, etc. I'm more curious to file mirroring, it doesn't have to mirror package management and setup (But that'd be cool too!)

Kyle
  • 552
  • 2
  • 5
  • 16
  • File Mirroring: Gluster or DRDB; Website mirroring: Varnish or HAProxy; DB Mirroring: MySQL Circular Replication, or Postgres Replication.; - Most server packages have a cluster operation mode, or there's reverse-proxies that allow you to do that. – Tom O'Connor Nov 17 '11 at 11:34

8 Answers8

6

It depends very much on the job at hand.

Why do you need file mirroring. Do you want to update something like a website or content repository where it's usually okay to update periodically. Or do you need real time synchronization of data?

For periodic asynchronous mirroring of files it is usually sufficient to have a Staging Area that you upload all your data to. And from where you distribute it to the Servers. In your case - with two servers - you could create some staging fileshare on srv1 to where you transfer the data (via FTP, NFS, DAV, SFTP, etc.) and then have a cronjob rsync the files to the "live" directories of srv1 and srv2. The easiest way to use rsync in that case is to generate a ssh keypair that you will use for data transfers and which is authorized on all servers in your cluster.

Example:

srv1:/data/staging/  <= is where you upload your data
srv1:/data/production/ <= is where your servers get their production data from
srv2:/data/production/

srv1$ cat /etc/cron.d/syncdata.cron
=====
*/5 * * * * syncuser rsync -a --delete /data/staging/ /data/production/
*/5 * * * * syncuser rsync -az --delete -e ssh /data/staging/ srv2:/data/production/
=====

This should give you a basic idea. Of course you would want to wrap the rsync calls in some scripts and implement a proper locking so that it doesnt run twice in case the sync takes more than 5min, etc. Also, it goes without saying that a staging area is not mandatory. You might as well sync srv1:production to srv2:production directly. Just than srv2 might show data that is up to 5min older than that of srv1. Which might be a problem, depending on how you balance between the two.

Another way to asynchronously distribute files is to package them as rpm or in your case deb files. Put these in a central repository and have them install/update via something like cfengine, monkey or some diy message bus based solution. This has the nice side effect of versioning of deployed data but is only suitable for smaller amounts of data that you produce and deploy yourself (like versions of your own software). You wouldn't wanna distribute TBs of data with this and also it's not suited to mirror content that changes with a high frequency, like every other minute or so.

If you need to replicate data in near realtime but not necessarily synchronous instead of calling a cron every so often you can use some inotify based method like the already mentioned incron to call your sync scripts. Another posibility is to use Gamin (which also uses inotify if present in the Kernel) and write your own little sync daemon. Last but not least, if all the files are uploaded to one server via e.g. SFTP you might check if your SFTP Server allows you to define hooks which are called after certain events, like file upload. That way you could tell your Server to trigger your sync script whenever new data is uploaded.

If you need real time synchronous mirroring of data a cluster filesystem might be in order. DRDB has already been named. It is very nice for replication on the block level and often used for high available MySQL setups. You might also wanna take a look at GFS2, OCFS2, Lustre and GlusterFS. Though Lustre and GlusterFS are not really suited for a two Server setup.

Lukas Loesche
  • 970
  • 1
  • 7
  • 11
  • DRBD looks nice. Is it bad to use this with one server being live? Like how would it affect the live server? – Kyle Mar 08 '10 at 02:01
  • Depends - what is the live Server doing? Is it a webserver, database server, fileserver, etc.? DRBD does synchronous replication, with all the implications that come with it. Depending on if you plan to go Single-primary or Dual-primary certain I/O caching (and filesystem) restrictions will apply which in turn affect your applications. For details I suggest reading the DRBD User's Guide http://www.drbd.org/users-guide-emb/ which is very well written and explains all the implications in great detail. – Lukas Loesche Mar 08 '10 at 19:08
5

Basically you have 3 possibilities:

  1. Let your application push the files to both servers.
  2. Asynchronous replication, e. g. rsync every 15 minutes (or less) with a cron job
  3. Synchronous replication on file system (e. g. GlusterFS) or block device level (e. g. DRBD). If you use replication on block device level, you need a file system which supports distributed locking (e. g. OCFS2 or GFS2) if you want to have r/w access to the files from both servers at the same time.
joschi
  • 20,747
  • 3
  • 46
  • 50
2

cron + rsync = mirrored directories/files

Chris S
  • 77,337
  • 11
  • 120
  • 212
1

Depending on your specific use case - You could use something similar to DRBD http://www.drbd.org/

Keiran Holloway
  • 1,146
  • 6
  • 13
1

If you are trying to build a backup solution here (which I have personally done in pretty much the same setup) be carful. There are many diferent thigns that you need to backup against, one of (arguably the) biggest being accedental deletion -- any live replication system will just replicate the deletion, and provice no safety. For this daily replication works, but is a pretty weak answer. Try RSnapshot.

Unison may well work for you, but I have no personal experiance.

Running Rsync in both directions with the aproprate flags can work, but it has the rather tricky issue of how to handle deleted files, without special handeling, it simply restores the files, which is fine if you never delete anything like me, but a bit poor otherwise. It also does odd things if a file is moved.

Whatever you are doing, if any situation can arrise where files can be simultaiusly be edited at both ends, you have a problem. unison is the only solution i know of that can handel this even close to satisfactorally.

Thingomy
  • 33
  • 7
  • Note that the loops mentioned below will not be a problem with Rsync, as it maintains the modify dates of the files it transfers if set correctly. – Thingomy Mar 07 '10 at 16:28
0

If it is one-way (I mean, always from one server to one other server, but not viceversa) you could use incron. It is like cron but based on filesystem events.

Everytime a file gets created or changed, it will trigger a scp or rsync to the other server.

Bi-directional has the problem of loops :).

chmeee
  • 7,270
  • 3
  • 29
  • 43
0

it depends on your needs..... i have a very "cheap and easy" setup for clustered webservers.

i simply have one "fileserver" (NFS) where all the webservers mount the following dirs:

/etc/apache/sites-enabled
/etc/apache2/sites-avaliable
/var/www

dead simple and working

bmaeser
  • 639
  • 2
  • 6
  • 10
0

clonezilla can also looked at which uses rsync