How can I automatically synchronize a directory tree on multiple machines?

3

1

I have two Mac laptops and a Debian server, each with a directory that I would like to keep in sync between the three. The solution should meet the following criteria (in rough order of importance):

  • It must not use any third-party service (e.g. Dropbox, SugarSync, Google whatever). This does not include installing additional software (as long as it's free).
  • It must not require me to use specific directories or change my way of storing things. (Dropbox does this IIRC)
  • It must work in all directions (changes made on /any/ machine should be pushed to the others)
  • All data sent must be encrypted (I have ssh keypairs set up already)
  • It must work even when not all machines are available (changes should be pushed to a machine when it comes back online)
  • It must work even when the /directories/ on some machines are not available (they may be stored on disk images which will not always be mounted)
    • This can be solved for Macs by using launchd to automatically launch and kill (or in some way change the behavior of) whatever daemon is used for syncing when the images are mounted and unmounted.
  • It must be immediate (using an event-based system, not a periodic one like cron)
  • It must be flexible (if more machines are added, I should be able to incorporate them easily)

I also have some preferences that I would like to be fulfilled, but do not have to be:

  • It should notify me somehow if there are conflicts or other errors.
  • It should recognize symbolic and hard links and create corresponding ones.
  • It should allow me to create a list of exceptions (subdirectories which will not be synced at all).
  • It should not require me to set up port forwarding or otherwise reconfigure a network.
    • This can be solved by using an ssh tunnel with reverse port forwarding.

If you have a solution that meets some, but not all of the criteria, please contribute it in the comments as it might be useful in some way, and it might be possible to meet some of the criteria separately.


What I tried, and why it didn't work:

  • rsync and lsyncd do not support bidirectional synchronization
  • csync2 is designed for server clusters and does not appear to work with machines with dynamic IPs
  • DRBD (suggested by amotzg) involves installing a kernel module and does not appear to work on systems running OS X

Blacklight Shining

Posted 2012-08-13T14:53:34.553

Reputation: 2 127

1

Take a look at http://www.cis.upenn.edu/~bcpierce/unison/index.html and http://en.wikipedia.org/wiki/DRBD

– amotzg – 2012-08-13T16:01:56.843

And maybe also read http://en.wikipedia.org/wiki/Replication_%28computer_science%29#Disk_storage_replication and it's following section.

– amotzg – 2012-08-13T16:04:02.730

Unison seems to be designed for a single client and server. I need a solution that works for any number of machines, some of which may not be online or may appear to not have the directory to sync with at any given time, and some with dynamic IP addresses. – Blacklight Shining – 2012-08-14T10:56:03.493

I know you don't want to change the way you do things, but have you considered having the directory on the debian server and then just exporting it (nfs) to the macs? That way, everyone is using the same directory. – terdon – 2012-08-14T17:27:32.443

This wouldn't allow me to access the files offline if I needed to. I might consider it if there's really nothing else, but I think there must be a way to do this. – Blacklight Shining – 2012-08-14T20:05:58.313

Combined, "It must work even when not all machines are available" and "It must be immediate (using an event-based system, not a periodic one like cron)" are a bit over the top for me. So you cannot guarantee that all machines are available, but for some reason you definitely need that a file is pushed to wherever it may be pushed on-the-fly, prohibiting a cronjob solution? – user39559 – 2012-08-24T21:01:58.323

I thought I was being a bit demanding. I /would/ like everything to be kept in sync when possible; i.e. when some machines are not available and changes are made, push the changes to the machines that /are/ available, and push them to the others when they come online. – Blacklight Shining – 2012-08-24T21:09:12.347

But why the rush? You can't wait a few minutes until the change is pushed? The computers are changing IP, and you want a solution that detects that also? – user39559 – 2012-08-25T00:00:10.770

I would rather not have to wait a few minutes or manually trigger it. But lsyncd can be used to start a sync whenever a change is made, so I don't see a problem there. Yes, I want a solution that also works across IP changes, and that can be done with an ssh tunnel. – Blacklight Shining – 2012-08-25T05:29:39.373

So, you say you want to keep plain filesystem semantics. But POSIX FS semantics allow the user to modify portions of files. If you want to allow disconnected operation, what will happen when two participants try to modify overlapping portions of the same file ? You need a mechanism to resolve these conflicts. This is a known difficult problem. Are you sure you really need complete FS semantics ? – b0fh – 2012-08-26T11:05:27.337

Filesystem semantics…does that include the symlink and hard link handling? And if two users modify the same file on two different machines, won't that just be considered a conflict that needs to be manually resolved? – Blacklight Shining – 2012-08-26T11:56:21.320

But how could one machine find the other if all IPs are changing unrestrictedly? – user39559 – 2012-08-30T00:08:12.713

Using ssh tunnels. I can set up autossh on each of the non-server machines to make them create ssh tunnels when they're online. (I just need a way to make autossh always restart the connection, even if it failed immediately.) I think I may have the solution to this; unison might work after all…

– Blacklight Shining – 2012-08-30T00:21:20.670

Answers

1

That's a long list of requirements and I don't think anything exists meeting all of them completely. However, I do have two suggestions that might get very close.

Sparkleshare

I find sparkleshare a very neat solution. It is cross-platform, provides GUI integration/notification, etc. Much like Dropbox, yet hosted yourself. The only thing you need is a central Git repository on some server.

The wizard to start a new project requires you to store your files in a specific directory, not meeting your second requirement. However, you can try to change that in the config.xml to fix that.

I think then there's just one requirement open: hard links support. The use of Git in sparkleshare prevents this - see Git and hard links. If you really really need that, this might be a blocker though.

Git-annex assistant

You might also get involved in another similar project, yet still in early development: git-annex assistant. Because it is using Git-annex, it might overcome the limitation of hard link detection.

gertvdijk

Posted 2012-08-13T14:53:34.553

Reputation: 3 396

I did think I was being a bit demanding…I will certainly try Sparkleshare; it's probably easier than my lsyncd-shellscript-unison idea (which likely won't preserve hard links, either). – Blacklight Shining – 2012-09-15T09:09:17.060