96

I am migrating my server from the USA to the UK from one data center to another. My host said I should be able to achieve 11 megabytes per second.

The operating system is Windows Server 2008 at both ends.

My average file size is around 100 MB and the data is split across five 2 TB drives.

What would be the recommended way to transfer these files?

  • FTP
  • SMB
  • Rsync / Robocopy
  • Other?

I'm not too bothered about security as these are public files anyway, but I just want a solution that can push the full 11 MB/s transfer rate to minimize the total transfer time.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Paul Hinett
  • 1,205
  • 3
  • 11
  • 19
  • 1
    Hope your not in a hurry, this could take weeks to get this transfered. – tony roth Oct 03 '11 at 20:17
  • 19
    11 MB/s or 11 Mb/s? – wim Oct 04 '11 at 02:11
  • 1
    @wim I think he originally said 11 Mbps and then meant 11 MB/s (~8 times faster) in 6th comment down on Shane's accepted answer. – dr jimbob Oct 04 '11 at 04:21
  • 15
    transfer the data to binary punch card and use a carrier pigeon :) – enterzero Oct 04 '11 at 02:13
  • 9
    You should provide detail. How many carrier pigeons do you think it would take? Show your work. – Evik James Oct 04 '11 at 02:31
  • 18
    @Evik European or African? – wim Oct 04 '11 at 04:03
  • 1
    https://www.xkcd.com/949/ – ypercubeᵀᴹ Oct 04 '11 at 10:55
  • @Wim: Going to the UK, European, obviously. – Bart Silverstrim Oct 04 '11 at 12:04
  • 1
    It would actually be helpful to know the type of files as well. Whether or not compression will work effectively could change the outcome. – Morgan Tocker Oct 04 '11 at 17:59
  • @Wim: I'm pretty sure there aren't African Carrier Pigeons. Swallows, maybe. And they are known to have a substantial payload. At least a coconut or two's worth. – music2myear Oct 04 '11 at 22:03
  • Will this data compress at all? – sjbotha Oct 05 '11 at 02:03
  • Ok so the punch card and the carrier pigeon was not the best idea, fair call to all those who pointed this out :) If I was going to do this I would do what other have stated and send it over on a HDD. Once you have it restored then look at doing a transfer over the internet of the changes that have happened between the time you shipped and when it got restored. It will be a way better than an 88 day transfer window. That is my 2 cents back to my punch cards :) – enterzero Oct 05 '11 at 02:39
  • 8
    As an aside, Wolfram Alpha is the most convenient way to do the calculation, "10 TB at 11MB/s". http://www.wolframalpha.com/input/?i=10+TB+at+11MB%2Fs – pufferfish Oct 05 '11 at 18:33

11 Answers11

173

Ship hard drives across the ocean instead.

At 11 Mbps with full utilization, you're looking at just shy of 90 days to transfer 10 TB.


11 Mbps = 1.375 MBps = 116.015 GB/day.

10240 GB / 116.015 GB/day = ~88.3 days.

Shane Madden
  • 112,982
  • 12
  • 174
  • 248
  • 43
    +1 for [Sneakernet](http://en.wikipedia.org/wiki/Sneakernet). Also, you forgot TCP/IP overhead. It's more like ~100 days under ideal circumstances. – Chris S Oct 03 '11 at 20:25
  • 44
    A wise man once said "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway". This equation is very true and not substantially altered by changing the station wagon for a boat. (http://www.bpfh.net/sysadmin/never-underestimate-bandwidth.html) – Rob Moir Oct 03 '11 at 20:36
  • Yup, or fly em in a few days. The down-side is you need to clone the hard drives at the source site before they're shipped, as you definitely don't want to risk shipping your master copies. This is probably something you can arrange the hosting company to do for you for a one-time fee. – Chris Thorpe Oct 03 '11 at 20:43
  • 5
    It's better to ship tapes, or blueray disks, rather than drives. If you go with drives, make sure the originals are kept safe and available just in case. I'd go for the drives myself (unless I had Ultrium 4 drives) because 10 TB = 410 single layer blueray disks! – Allen Oct 03 '11 at 20:44
  • wonder how compressable the data is?even if I was going to fill up the wagon (think that was jim gray's statement) I'd still want the fewest amount of disks to move around. – tony roth Oct 03 '11 at 20:51
  • The data center won't allow me to ship out my drives, it's against their policy apparently, I had offered to pay for them too. Didn't realize 11Mbps would take 90-100 days...that is horrible to think about and I thin I need to re-consider my options now. – Paul Hinett Oct 03 '11 at 20:51
  • 2
    I'd fire the datacenter operators and get better ones. – tony roth Oct 03 '11 at 20:55
  • 10
    Just realised that i typed 11Mbps, however is what I actually meant was 11MB/s. I suppose this makes quite a big difference, my calculations have it to around 11-14 days roughly...is this correct? – Paul Hinett Oct 03 '11 at 21:14
  • @PaulHinett Yup, that's about right. – Shane Madden Oct 03 '11 at 21:26
  • Ok maybe 14 days I could just about handle, not ideal by any means...ok so now it goes back to my original question...what would be the best method of transferring? – Paul Hinett Oct 03 '11 at 21:29
  • And I'd seriously still be asking if the data could be burned to disk and shipped in less than those 14 days (expect it to be 2 or 3 days more due to overhead mind you). – Rob Moir Oct 03 '11 at 21:37
  • 3
    @PaulHinett If you do want to go for the over-the-wire transfer, I'll refer you to Korjavin's answer - go with rsync. Oh, and if the data compresses decently, add the `-z` option to add gzip compression to the transit. – Shane Madden Oct 03 '11 at 21:39
  • my data is MP3 files, not sure if this will compress well or not? my data is split up into around 5000 sub-directories on each drive with an average of about 10 files in each folder...can i just let rsync process the whole tree for each drive or should i break this down into smaller chunks? – Paul Hinett Oct 03 '11 at 21:46
  • 3
    No, MP3 files are already compressed and do not compress well at all - setting `-z` would just be a waste of CPU time. rsync should have no trouble with just hitting the whole tree. – Shane Madden Oct 03 '11 at 21:55
  • 18
    still believe that sending a man oversee with the 10TB backup while the official disk are still working then once the setup is done, you can lunch a rsync to update the new server for any change. You'd have your machine up and running in about a day. – Loïc Faure-Lacroix Oct 03 '11 at 22:54
  • 1
    I agree, for MP3 files, just a straight `rsync` is about the best you can do. – David Schwartz Oct 04 '11 at 00:06
  • 1
    +1 - my answer to the question was: 'Fedex'. – Vector Oct 04 '11 at 03:17
  • If you do end up shipping drives, make sure to send redundant ones, and also I'd recommend splitting it up into smaller ones so that if a drive dies you lose less of its data. Then use rsync to backfill any missing data from whatever drives DO die. I speak from experience on this exact use case... – fluffy Oct 04 '11 at 04:16
  • If you don't care about the time, but do want the data to be hopefully error free, couldn't you create a DFS volume to sync the data? Should retry if there's timeouts and issues (which over that amount of time there's bound to be something wrong at some point). – Bart Silverstrim Oct 04 '11 at 12:05
  • 1
    If your data is public you could try breaking it into chunks and using Bittorrent to get them copied over. If you have more than one system on different links from which to pull the data. No errors in transit... :-) – Bart Silverstrim Oct 04 '11 at 12:07
  • Even if your data is private, you can use Bittorrent in a private network. – Jeff Ferland Oct 04 '11 at 17:45
  • Talk to your datacenter. See if you can ship them drives and then have them ship them back out once they are done copying. If they don't want to do something like this.. find another datacenter once you are done copying. Also, might be worth seeing if you can upgrade to a gigabit connection for awhile. – devicenull Oct 05 '11 at 02:39
  • 3
    @Mikey You don't want to fedex the drives. That may take 10days or more to clear customs. If you send a person with a carryon full of drives, they [although will get special screening by security and possibly negative looks] will get there within ~14hours. – monksy Oct 06 '11 at 05:12
  • I think this is an intellectually lazy answer and does not reflect storage neworking best practices. – Brennan Jun 22 '12 at 22:35
  • 2
    @Brennan What would you propose instead? It's clearly not an ideal generic solution to trans-atlantic file transfers, but keep in mind the context of my answer; a) it's a one-time transfer, b) it's pretty clearly not feasible to get a fatter internet circuit set up just for this one-time transfer, and c) it was provided when the question read as 11Mbps, not 11MBps. – Shane Madden Jun 23 '12 at 18:29
  • @ShaneMadden I'd consider a few different things: Deduplication, block repetition, bandwidth optimization/acceleration, bandwidth increase/bursting, compression, a trip and use of DD, adding another server for a month, private networking, paying for one of the datacenter's staff to do some of above, etc. There are a huge list of options with respective pros and cons that could be considered, each with costs, time, schedule, and ease of doing impacts. – Brennan Jun 23 '12 at 18:40
  • 2
    @Brennan yup, and all of them more expensive by means of time and financial charges than bits by mail. For a one-time thing, intellectually lazy is smart. – Jeff Ferland Jun 23 '12 at 18:43
  • @JeffFerland Are you saying that all the alternatives I just mentioned come with costs (some of them are configuration changes), and you need to factor in existing costs for what appears to be a dedicated server with a significant amount of storage space and processing capability..... Looking at above, I can tell you at least a few of my alternatives are FAR cheaper. Furthermore, some of the alternatives are merely alterations of the same idea. ***I'm sure you read where the OP said that shipping the drives directly was not possible, right?*** – Brennan Jun 23 '12 at 18:52
  • 1
    Couldn't help but notice that the OP accepted this answer... – Basil Jun 23 '12 at 23:30
26

I'd say rsync, at 11 MB/s you will look at 10-14 days and even if you get interrupted, rsync will easily start where it stopped last time.

At 11 Mbps I'd ship the hard disks like suggested above :)

Lucas Kauffman
  • 16,818
  • 9
  • 57
  • 92
  • 1
    Your estimate differs very significantly from what others have posted (and I don't know who is correct). Can you supply your methodology for arriving at those figures? – John Gardeniers Oct 03 '11 at 23:54
  • 9
    The difference arises from the OP misstating 11 Mbps when in fact he meant 11 MBps -- which is 8 times faster. BTW, restarting a 10 TB rsync in the case of an interruption will probably take a while, won't it? Hours, or longer? – Frank Farmer Oct 04 '11 at 00:15
  • @FrankFarmer: i wouldn't worry about rsync restarting; I keep an offsite copy of ~20TB over a 30Mbps wireless line, and restarting is in the seconds range. the initial copy took a couple weeks, but the nightly update is usually a couple hours. – Javier Oct 04 '11 at 04:23
  • @FrankFarmer - rsync seems to scale very well. I have an ~2TB over a rural ADSL1 line that was initalised with sneakernet, but takes ~5 min to rsync every night if nothing has changed. – Flexo Oct 04 '11 at 07:28
  • 6
    rsync restart time scales with number of files (mainly from `stat` time, in my experience), not with total data. I would expect no significant wait (several minutes at most). Though my experience with rsync tops at a little under 5TB. – derobert Oct 04 '11 at 14:52
  • there are other advantages of rsync (or any electronic transfer): - you start getting data immediately (so, if from the 10TB, 8 are archives and 2 are current data, you can send these first). - If the hard drive sent is damaged in transit, unless it was stored with redundancy, you just wasted 10 days – Rodolfo Jun 05 '12 at 17:06
  • The real disadvantage of rsync is that you then have to pay for 10TB+ of bandwidth, on both ends. – Michael Hampton Sep 11 '12 at 05:44
  • I don't know where you host your servers, but for that traffic it would cost me 20 euro extra on my current plan. – Lucas Kauffman Sep 11 '12 at 07:17
15

Rsync of course.

At least you can continue at any time after a break, and it's without any pain.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Korjavin Ivan
  • 2,230
  • 2
  • 25
  • 39
  • 7
    3+ months to copy at 100% utilization. Sorry, but that's a terrible way to transfer that much data. – Chris S Oct 03 '11 at 20:26
  • I have to agree with @ChrisS, using `rsync` just to copy large files is not efficient. For my stuff I ended up using `tar` over `netcat` or `ssh` for the initial transfer. It is much faster and starts to transfer immediately, while `rsync` will scan all files first which takes time. If this get interrupted you still can use `rsync` afterwards. In fact, I do this sometimes after `tar` anyway to ensure all permissions, socket files, etc. are correct. – Martin Scharrer Oct 04 '11 at 07:20
  • 1
    After the OP correcting that he's got ~100Mb connection, not 11Mb, rsync makes much more sense. +1 for the first to mention it. – Chris S Oct 04 '11 at 12:29
12

Never underestimate the bandwidth of a station wagon full of tapes

-- Trad.

In your case, disks or tapes sent by courier, but the principle still applies. If you're not concerned about latency, this will be vastly cheaper than the network bandwidth to transfer 10TB of data in any reasonable length of time.

  • Jeff Atwood ran the numbers in one of his old Coding Horror posts.. http://www.codinghorror.com/blog/2007/02/the-economics-of-bandwidth.html – tardate Oct 04 '11 at 19:34
10

You should use rsync. It will compress the data and de-duplicate it before sending. It can also resume partial transfers, which is very important for any large transfers.

It's likely it doesn't transfer 10 TB; if it's logs and text and such it could well be under 1 TB; perhaps way below 1 TB.

There are tools that do a better job of compression than rsync and likely find more matches. You could use lrzip, etc.

There are specific types of data that doesn't compress well and doesn't contain literal dupes - videos and other media for example. In those cases, FTP and rsync are doing much the same effort.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Will
  • 229
  • 1
  • 2
  • 8
  • 4
    RSync deduplicates data? I think it only does this at the file level, meaning deduplication is mostly useless in this case. – devicenull Oct 05 '11 at 02:41
6

I know this is already accepted but have you considered taking your disks to a data center/provider/host where you can get more bandwidth? It'll probably cost you some money but copying 10240Gb to backup disks and sending of will also cost both time and money (2 x money).

Also you'll be sure your disks don't break in transport.

Asken
  • 215
  • 1
  • 2
  • 8
5

11Mbps? This is quite a limitation you have here. In your situation I would simply:

  • Clone the data
  • Compress it
  • Rent servers on both ends with at least 10 times more bandwidth (in the same data centers or on your end in a data center near you).
  • Transfer the files
  • Apply the data to the new server.

If you really have no solution to increase bandwidth... Then shipping a physical drive will be way faster.

From my painful experience hard drives tend to break in the mail... USB flash drives are a way better solution for frequent data transfers. In your case it would require a few of them :) So send 2 copies of your data on multiple hard drives.

Considering the amount of data you have you could also send drives from a RAID 5 or RAID 6 array if you have the same hardware/software on the other side to plug your drives in. But in that case remember to mark the order of your drives and their serial numbers so when reconfiguring they don't get mixed-up.

Coyote
  • 151
  • 4
4

While I have to agree on the "ship it using harddrives" answer in this case, here a copy solution I use when I have to copy large amounts of files for the first time:

While rsync is good to keep two data storages in sync, it introduces quite a bit of unnecessary overhead for the initial transfer. I figured that the fastest way is to tar which gets piped over netcat. On the receiver site you can also use netcat in listen mode which pipes the incoming data to an extracting tar. The benefit is that tar starts sending immediately and netcat sends it as plain TCP stream with no extra higher-level protocol overhead. This should be as fast as it gets. However, it is not simple possible to restart a interrupted transfer at the last position.

It is also easily possible to compress the data for the transfer by using the right tar options or add a compression tool in the pipes. Note that netcat sends the date unencrypted. In cases where this is not an option, an encrypted ssh connection can be used instead (tar <options> | ssh <target> -c 'tar -x <options>').

If all data is transfered rsync can be used to ensure that all files which got updated in the meantime are synchronized. Also IIRC tar doesn't create sockets which will get lost otherwise, but they aren't really used for datacenter data anyway.

Martin Scharrer
  • 181
  • 1
  • 7
3

Again, first suggestion is to ship the drives.

Second suggestion is to use rsync to rsyncd, not over SSH. I've tried many things and it is usually the fastest. Remember to turn on compression. Also, look at increasing or decreasing the rsync buffer size to get the optimal transfer rate. It may also help to increase your MTU size. This only helps if routers en route don't fragment your packets though. There are ways to determine if they do.

Unfortunately there is no setting that's always the best. You'll have to experiment to find out what works best in your situation.

sjbotha
  • 305
  • 4
  • 8
2

You mentioned the servers are running Windows 2008. Would Microsoft DFS be suitable? There is some magic in the lower end that tries to get as much bandwidth out of the connection as posible, and also has compression and de-duplication (IIRC).

Mind you, hard drives, DVDs or BluRays would be faster... My calculation is 11 days at the full 11 MB/s...

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
TiernanO
  • 754
  • 6
  • 17
1

You can use a torrent for this.

Create a private torrent at one end and use the client on the other.

Although there is encryption in place you must check with your requirements.

Dragos
  • 349
  • 1
  • 2
  • 11
  • 1
    A 1 to 1 torrent relationship is no better than a 1 to 1 file transfer. If there is limited pipe between the two sites you need multiple seeders on different pipes, ideally geographically distributed. – Jeremy Oct 05 '11 at 17:43
  • @Jeremy - it's no better or worse in terms of throughput. It may be better in terms of reliability (easy pause/resume), which for this size xfer could be important – Joel Coel Oct 18 '11 at 17:52