2

Platform: Ubuntu 10.04 x86.

We have a HTTP server (nginx, but that is not relevant) which serves some static content. Content is (rarely) uploaded by content-managers via SFTP, but may be changed / added my some other means (like a cat, made directly on server).

Now we want to add a second, identical HTTP server — a slave mirror in another data-center on another continent. (And setup DNS round-robin.)

What is the best way to set up synchronization between master server and a slave mirror, so that delay between modification and re-syncronization is minimal (a few seconds should be bearable though)?

The solution must cope with large changesets and race conditions. That is, if I change 1000 files, it should not spawn 1000 syncronization processes. And if I change something while synchronization is active, my new change must eventually make it to the server as well... And so on.

Rejected solutions:

  • CDN — does not worth the money for our particular usage scenario.
  • nfs — not over global internet.
  • dumb cron + rsync — latency and/or system load would be too large.
  • manual rsync — not reliable, content is changed by non-IT users.

I would say that we need something based on inotify. Is there a ready solution?

Update: two extra (rather obvious) requirements that I forgot to mention:

  • If data is somehow changed on the slave mirror (say, a superuser accidentally deleted a file), synch solution must restore data back to the master state on the next sync.

  • In idle state the solution must not consume traffic or system resources (other than some memory etc. for the sleeping daemon process of course).

Update 2: one more requirement:

  • The solution must work with UTF-8 file names.
Alexander Gladysh
  • 2,343
  • 7
  • 30
  • 47
  • It looks similar to http://serverfault.com/questions/157901/lsync-unison-or-some-other-inotify-auto-syncing-tool – Mircea Vutcovici Jun 07 '11 at 19:59
  • `lsyncd` see: http://serverfault.com/questions/7969/is-there-a-working-linux-backup-solution-that-uses-inotify – Mircea Vutcovici Jun 07 '11 at 20:01
  • @Mircea: please add `lsyncd` as a regular answer, so it can be upvoted / discussed properly. ;-) – Alexander Gladysh Jun 07 '11 at 20:03
  • Wait, seriously, how did you get the cat to make server content? Do you work for WikiHow or some other content farm? – flumignan Jun 07 '11 at 23:36
  • Well, something like `cat >crossdomain.xml` and type a bit (or just paste into the terminal). It is a rare event, but it *can* happen, and synch solution must be ready. ;-) The point is that I can't use SFTP hook or something like this — multiple potential sources of changes. – Alexander Gladysh Jun 08 '11 at 06:22

4 Answers4

1

Have you considered Unison as a means to keeping files in sync? Using it, you'd be able to do the one-way sync you're requesting. It seems like a reasonable fit for this application.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
1

You could use lsyncd see: Is there a working Linux backup solution that uses inotify?

Mircea Vutcovici
  • 16,706
  • 4
  • 52
  • 80
1

What about pirsyncd? I think it`s good idea for you ;)

-2

Seems like this is where you might want to write a script that checks on timestamps of files and if the timestamp is later than last run of script, assume that file needs to be pushed, then trigger rsync or some other tool to synchronize the file. Likewise, on the other side, do the same thing with checking if a file has been changed, and if so, trigger a pull. Fabric might actually be a good tool for this. If you are familiar with Python, using fabric may be the way to go, in combination with timestamp checking.

slashdot
  • 651
  • 5
  • 7
  • Sorry, but (1) I explicitly said that the solution should be invoked automatically and (2) this task is too full of potential pitfalls (again, see the question for an incomplete list) to write a script by hand without trying existing solutions. – Alexander Gladysh Jun 08 '11 at 19:24
  • I personally do not believe this would take a lot of work to write, and without cron this could be run as a simple daemon, which is completely hands-off. This is a lightweight solution in general, and I would argue that it has advantages over other possible solutions. In fact, I had something similar to these requirements, and had implemented something similar to this. Process basically ran as a daemon, and after execution would go to sleep for a preset amount of time. There was a checker script run by puppet to make sure the process if it ever died would get restarted. – slashdot Jun 09 '11 at 02:52