Centralized distribution/syncing sets of large files through local network

1

Even though I am fully aware that versions of this question have been asked googol number of times, I'll try not to repeat them.

I have many sets of many files (some files are small, but some are large, like, ~10-20GB). I have multiple servers, each one can host one or more of those sets of files. Of course, one server can host 50% of total number of sets, and other 50% can host another number of sets.

You can think of set as of collection of large media files, really big image libraries, complete applications, whatever, it doesn't really matter, as long as there are large files in the set.

Server can update its copy of set at any point in time (either by replacing files in the set with completely new files, or by applying patches to some of files, which would result in having almost same files with only slight differences).

On the other side, I have many clients, who should be able to obtain any given set (or multiple sets) from servers, and keep their copies of sets up-to-date (synchronized) with sets on server, whenever one wants to use the set.

The tools that I have considered are following:

  • rsync -- It's great for syncing many small-to-medium-sized files, but not so ideal when syncing large files, as it uses algorithm which reads entire file at both sides in order to determine if file should be copied or not. This is okay when file should be copied for the first time, or when file is completely changed, but not-so-okay, when, say, only 1% of 10GB file is changed.
  • SVN -- It's great when it comes to finding differences and transferring only those deltas around, but I'm not so sure how optimal it is when it comes to disk usage (will entire set be twice as big on both client and server, due to once set is stored in repository?).
  • Torrent -- This one could be feasible, distribution-wise. For instance, create a torrent for each set on server, start seeding it there, and clients that receive those sets also continue to seed to other clients, thus distributing the load across every computer that holds copy of set. However, I'm not sure if it would be able to somehow distribute differences, once set on server gets changed... Would it require creation of new torrent for each change? Also, I don't know how torrent would behave in local network, speed-wise (could it be able to transfer files between one server and one client at maximum, network-limited speed. or it adds some serious protocol overhead? How about network congestion?)
  • Custom solution. Well, not much to add here, but that it would most likely be re-inventing the wheel, and that some existing solution would most likely fit my needs, if I was only aware of it.

So, the question is: what method of distribution/synchronization (utilities, approach) would be best suited for my situation?

mr.b

Posted 2010-10-21T21:34:39.667

Reputation: 341

This may work better in Server Fault. – Zian Choy – 2010-10-22T06:19:00.837

Answers

1

Out of the solutions you listed, SVN looks the most promising. You will need to store at least 1 copy of the set in the repository so you will be using up to 2x the space (or 3x, if you have 2 working copies).

In today's day and age, hard drive space is (generally) cheap so I don't think the space requirements would be too much of a burden, especially compared to trying to make your own custom solution.

You may also want to look into the MS Sync Framework, which is used by SyncToy.

Zian Choy

Posted 2010-10-21T21:34:39.667

Reputation: 1 394

This may work better as a comment than as an answer. – Ignacio Vazquez-Abrams – 2010-10-21T23:58:34.157

A working answer textbox may have worked better than an unusable comment box. ;) – Zian Choy – 2010-10-22T06:17:44.443

Hopefully things are a little better now. :) – Zian Choy – 2010-10-28T05:57:14.140

thanks for MS Sync FW reference. In the end, however, I choose torrent. – mr.b – 2011-05-20T12:14:17.490