While I have to agree on the "ship it using harddrives" answer in this case, here a copy solution I use when I have to copy large amounts of files for the first time:
While rsync
is good to keep two data storages in sync, it introduces quite a bit of unnecessary overhead for the initial transfer. I figured that the fastest way is to tar
which gets piped over netcat
. On the receiver site you can also use netcat
in listen mode which pipes the incoming data to an extracting tar
. The benefit is that tar
starts sending immediately and netcat
sends it as plain TCP stream with no extra higher-level protocol overhead. This should be as fast as it gets.
However, it is not simple possible to restart a interrupted transfer at the last position.
It is also easily possible to compress the data for the transfer by using the right tar
options or add a compression tool in the pipes.
Note that netcat
sends the date unencrypted. In cases where this is not an option, an encrypted ssh
connection can be used instead (tar <options> | ssh <target> -c 'tar -x <options>'
).
If all data is transfered rsync
can be used to ensure that all files which got updated in the meantime are synchronized. Also IIRC tar
doesn't create sockets which will get lost otherwise, but they aren't really used for datacenter data anyway.