24

I work with a team to manage 500-600 rented Windows 7 computers for an annual conference. We have a large amount of data that needs to be synced to these computers, up to 1 TiB. The computers are divided into rooms and connected through unmanaged gigabit switches. We prepare these computers ahead of time with the Windows installation and configuration, plus any files that we have available to us before we send the base image in for replication by the rental company. Every year, we have presenters approach on site with up to gigs of data that need to be pushed to the room that they will be presenting in. Sometimes they only have a few files that are small sizes, such as a slide PDF, but can sometimes be much larger >5 GiB.

Our current strategy for pushing these files is using batch scripts and RoboCopy. For the large pushes, we actually use a BitTorrent client to generate a torrent file, and then we use the batch-RoboCopy to push the torrent into a folder on the remote machines that is being monitored by an installed BT client. Often times, this data needs to be pushed immediately with a small time window. We have several machines in a control room that are identical to the machines on the floor that we use for these pushes.

We occasionally have a need to execute a program on the remote machines, and we currently use batch and PSexec to handle this task.

We would love to be able to respond to these last minute pushes with "sorry, your own fault", but it won't happen. The BT method has allowed us to have a much faster response time, but the whole batch process can get messy when there are multiple jobs being pushed. We use Enterprise Ghost for other processes, and it doesn't work well in this large of scale, plus it is really quite expensive for a once-a-year task like this.

EDIT: There is a hard requirement that the remote machines on the floor are running Windows. The control machines do not have a hard OS requirement. I would really like to stay away from Multicast because of complications with upstream routers. Is Multicast or BitTorrent the better way to go on this? Is there another protocol that might work better?

WMIF
  • 340
  • 2
  • 6
  • 3
    As a well written question it pains me to say that Shopping Questions are Off-Topic on any of the [se] sites. See [Q&A is hard, lets go Shopping](http://blog.stackoverflow.com/2010/11/qa-is-hard-lets-go-shopping) and the [FAQ] for more details. – Chris S Oct 11 '12 at 21:02
  • `I would really like to stay away from Multicast because of complications with upstream routers.` can you elaborate on why? – Zoredache Oct 11 '12 at 21:03
  • 2
    Honestly aside from digging up a multicast solution your home-grown BitTorrent and PSExec process sounds like about the best you can do. You may want to wrap some PowerShell around it for pretty/automation sake, but that's about the best suggestion I can give you... – voretaq7 Oct 11 '12 at 21:08
  • @Zoredache- Honestly, it is partly because I don't have enough experience with multicast. I have dealt with Ghost and multicast, and have not really come out with very good results. The upstream equipment belongs to the facility we contract with, and we have dealt with some pretty difficult IT groups at some locations. I would not expect any of their staff to even know what multicast is. – WMIF Oct 11 '12 at 21:08
  • @voretaq7- I have considered putting an interface over our structure, but I would have a hard time doing it in powershell. It has been painful for me to learn PS, mostly because it is very different behavior from the more structured languages. If I put an interface over it, it would most likely be in C#. – WMIF Oct 11 '12 at 21:13
  • @ChrisS- I have change the question to hopefully fit within the rules of the site. – WMIF Oct 11 '12 at 21:27
  • Try using a scriptable BitTorrent client, or at least one with a web interface that you can control remotely, such as Transmission. – Michael Hampton Oct 11 '12 at 23:51
  • @MichaelHampton - We use a BT client that has the capability of monitoring a designated folder to watch for torrent files to be placed in it. We place the torrent files in that folder by use of the above mentioned RoboCopy and batch file methods. The BT part has been very reliable for us. It is the packaging and pushing the file initially that gives some hassle. Also, BT has a limitation that we cannot add to existing folders. – WMIF Oct 12 '12 at 03:05
  • `The BT part has been very reliable for us. It is the packaging and pushing the file initially that gives some hassle`: Care to elaborate on this? `BT has a limitation that we cannot add to existing folders`: I think this can be solved by a bit of scripting on the client: delete the current torrent download (but not its data) and begin a new torrent download to the previously used folder. – pkoch Oct 16 '12 at 19:34
  • In order to use the BT push, we have to first create the package. This involves a manual process at this point because we haven't automated it. Once we have the .torrent file, we use the batch/RoboCopy method to push it to the target machines in a folder where the installed BT client is monitoring. One thing I can't do at this point is remove the torrent from the client (not automated). If I push files to a computer, but then have to push an updated set of files (it happens) then I will end up having 2 torrent packages pointing at the same files. – WMIF Oct 18 '12 at 01:16
  • You can read [how facebook deploy code](http://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/) And i think that pear-to-pear is only solutions. Also you need fast disks and server with 10G nic for initial peak. – Guntis Oct 18 '12 at 17:56
  • Thanks for the interesting read @Guntis, but it doesn't really go into technical details about how they use BitTorrent. Do you know what they use to manage this task? There is an interesting video at this link that talks about a package used by Twitter called Murder. [link]http://engineering.twitter.com/2010/07/murder-fast-datacenter-code-deploys.html – WMIF Oct 18 '12 at 21:14

4 Answers4

12

You really do want a Multicast File Transfer Program: UFTP, with decent documentation and proxt-style extensions for NAT/router traversal too.

dbush
  • 153
  • 8
Chris S
  • 77,337
  • 11
  • 120
  • 212
  • I am still looking over the documentation for the link you gave, but I am not yet seeing any mention of the lower level multicast stuff. By that I mean, does this still rely on the multicast rendezvous point to be established in some router up the chain? I am relying on the location to provide my gateway, so I do not have my own central router that can provide this function. Does UFTP have some way to emulate that within the software? – WMIF Oct 11 '12 at 21:17
  • This is the type of results I often get when dealing with multicast, and why I stated that I would like to stay away from it. http://serverfault.com/questions/56487/what-affects-multicast-rates?rq=1 – WMIF Oct 11 '12 at 21:42
  • I'm not familiar with the software or hardware of that question, but I've not experienced issues like that in my setups. Multicast will run at the rate of the slowest client, so if you have different "strata" of client speeds you may wish to run multiple multicast sessions for each stratum (doesn't affect the final completion time though, just allows faster computers to finish sooner). The intermediate switches and routers have to support multicast, proxies can mitigate that to a degree, but if you want performance you'll need a reliable/capable network infrastructure. – Chris S Oct 12 '12 at 02:39
  • The switches involved for the infrastructure of this question are all unmanaged. The routers will be different for each location, and I can't guarantee they support multicast. Is there a way that I could get my own router hanging off to the side, but still be involved with the multicast part of it? I ask because I don't think it is possible. – WMIF Oct 12 '12 at 03:00
  • Management doesn't directly affect a switching support for multicast. All 1GB Ethernet switches are required to support Multicast (though old and particularly cheap ones implement it as broadcast, which gets the job done); Older switches generally did this already. The lack of router support is why you'd need proxies setup (one for each multicast-isolated network segment). Obviously using routers that do support multicast will be the easiest, but configuring proxies isn't terrible (unless there's a large quantity of them). – Chris S Oct 12 '12 at 03:07
  • I read this part from the documentation on proxies: _This allows applications to function when there is little to no access to routers_ and I am not sure if I am reading this as referring to no access to administration of the routers, or no presence of routers in the network. If I were to set up a proxy in the network structure described above, would the proxy prevent me from having to configure the routers for multicast? – WMIF Oct 18 '12 at 01:11
5

You might want to look into murder

Murder is a method of using Bittorrent to distribute files to a large amount of servers within a production environment. This allows for scaleable and fast deploys in environments of hundreds to tens of thousands of servers where centralized distribution systems wouldn't otherwise function

.

Murder was/is developed by the folks at twitter, and they're using it daily to distribute files.

Jens Timmerman
  • 866
  • 4
  • 10
4

A new solution may have appeared: BitTorrent Sync

Later edit: Actually these days I would probably recommend git-annex assistant or syncthing but there are many alternatives

ptman
  • 27,124
  • 2
  • 26
  • 45
  • Not just another one but a very good one in this use-case i think! – Argeman Apr 25 '13 at 11:26
  • I agree. I found this on /. and I am currently setting up some test scenarios to see if it will work as good as it sounds. – WMIF Apr 25 '13 at 17:38
  • I just did a large scale test with the BT Sync, and discovered that it uses 239.192.0.0 for multicast broadcast-ish traffic. I had no control on the network infrastructure so I was not able to fully analyze, but this was causing a noticeable surge in outbound internet traffic. Combined with some general file syncing mismatch issues, and it caused enough problem that I had to shut it down. Unfortunate because it was working beautifully on a smaller scale less than 50 computers. – WMIF May 24 '13 at 03:38
1

I might have an idea for you that would help. Forgive me as I don't fully understand why this needs to be so complex, but if your need is to keep it simple for the end user and yet make the data quick and easy to reach inside a LAN, perhaps you could go with a NAS device. I have a Synology DS1812+ it can RSYNC with another Synology drive or various RSYNC capable devices, it has a bit torrent capable application called "Download Station", I know you can download torrent files from the drive bay and I believe you can also create or post a torrent file with that application to allow others to download a file they need. It has apps for mobile devices both Apple and Android. It also can do FTP transfers. This drive bay could give you the ability to send a file to it quickly then disperse it amongst the LAN quickly and easily. I suggest placing the data inside the LAN just to make it faster access for the LAN users, but the beauty of these NAS devices is that you can put them anywhere online as long as they have a fast internet connection. Perhaps one of the higher end Synology NAS devices would be a good fit for your needs.

Synology has a virtual interface you can look it over to give you a better idea if this would be useful for you. I will paste the link below to the virtual interface

http://www.synology.com/products/dsm_livedemo.php?lang=us

This device also gives people the ability access their data via a web interface or mobile device application.

I hope this helps out and like I said, forgive me if I don't understand the question correctly.

Frank R
  • 141
  • 3
  • 1
    I'm not sure how that's better than what he's doing now. WMIF can't want to have to wait for the users to start to get the data themselves; his/her team is expected to pre-stage the datafiles onto the machines as it becomes available to them. – mfinni Oct 15 '12 at 18:36
  • @mfinni is correct. The job is to make sure that the files that we are given are all loaded onto the conference floor computers before the attendees set foot into the room. – WMIF Oct 18 '12 at 01:12