3

At our institute, we will soon need to routinely share large volumes of data (multi-terabyte range).

  1. Would it make sense to use BitTorrent for this task?

  2. How large of a CPU/memory overhead is to be expected, compared to common FTP servers?

  3. Is it possible to achieve speeds matching a direct FTP transfer when copying from exactly one BitTorrent peer (the original storage server) to another?

Thank you very much.

dpq
  • 416
  • 3
  • 17
  • Are you distributing the file(s) to a large number of users? If by "sharing" you mean to one or a handful of sites, I don't think BT would help that much, as it only gets faster once other nodes have downloaded it and are able to share it. – Dave Drager Aug 13 '09 at 14:39
  • It would suffice if BT helped us manage load balancing and other such issues among our own storage servers; ability to use bandwidth of other universities is a plus but not the central point of using BT. – dpq Aug 13 '09 at 15:27
  • 1
    Jeff Atwood wrote a good article on the CodingHorror blog regarding whether and how to share files with BitTorrent. Might be worth a read. http://www.codinghorror.com/blog/archives/001272.html – Ryan Fisher Aug 13 '09 at 15:53

3 Answers3

3
  1. I would think so. Be careful about the block-size you chose as it will need to be larger than standard for such a large amount of data
  2. Not significant during the transfer, your bandwidth will be the bottleneck not your CPU. Generating the torrent meta-file (which involves hashing each block and the whole set of data) in the first place will take quite some time as will the final hash check after the transfer has completed on teh client
  3. Yes. Unless your connectivity provider, the client's provider, or somewhere between, is selectively shaping P2P traffic.

To mitigate issues regarding points 1 and 2, if you can split the data into smaller chunks and have separate torrents for each chunk you might find the size of the data easier to handle.

Also note that you will need to regenerate the torrent metafiles if any data in the file(s) they cover is updated. If small parts of the data change without the rest changing, you probably find rsync to be a much more efficient solution.

How large are the files in the dataset and what is the spread like (several multi-gig files?, many smaller ones?, ...)?

David Spillett
  • 22,534
  • 42
  • 66
  • 1) Once put there, files are not likely to be changed (raw experimental data that should be kept intact). 2) I expect the files to be about 1Gb each, but the total volume of data to share is likely to exceed 40Tb. – dpq Aug 13 '09 at 15:21
3

You did not mention how many machines will be in your "mesh" for bittorrent; if it is going to be ony a few, then bittorrent may not be worth the trouble of setting up the torrent files and getting them to people, plus running the tracker.

I think of this also from time to time and always come back to BT's real use; sharing files on the Internet where everyone only has to contribute a portion of bandwidth. On home or work 100Mbs networks, I use web servers and pass around links instead.

JamesR
  • 1,061
  • 5
  • 6
  • it can be very easy to set up torrents now - utorrent lets you create them directory from the GUI, and serve them too. eg. http://thenexus.tk/transfer-files-without-a-tracker-in-utorrent/ – gbjbaanb Aug 13 '09 at 14:57
  • We plan to have four storage servers initially, sharing the same uplink. However, it is possible that we decide to relocate some of these to sites with better connectivity to other academic networks that our site has no dedicated link to. – dpq Aug 13 '09 at 15:24
1
  1. Yes, very possibly, it could save you a hell of a lot of bandwidth costs at the probable expense of average download speed per user.
  2. Pretty low overall, depends on the server obviously but in general terms one server acting as a BT peer to a decent sized swarm will be lower CPU use than the same server FTPing the same file out to lots of clients.
  3. Anything is possible, it could be much quicker or much slower, it depends on the size of the swarm at any given time plus so many other factors that you'll never know for sure.

The most important thing to focus on is your customer experience, if you can't afford to piss off your customers then go with FTP as it's controllable - if they're tech savvy and understand the benefits to you and them then you'll be fine with BT. Good luck.

Chopper3
  • 100,240
  • 9
  • 106
  • 238