How long would this file transfer take?

1

I have 12 hours to backup 2 TB of data.

I would like to backup to a network share to a computer using consumer WD 2TB Black 7200rpm hard drives. Gigabit Ethernet.

What other variables would I need to consider to see if this is feasible? How would I set up this calculation?

CT.

Posted 2010-06-17T04:08:27.760

Reputation: 2 549

Answers

4

The two big factors here are how fast the source can pitch data, and how fast the receiving end can commit it. GigE is a really good start, which means in theory it could take as little as 4.7 hours. Factors that can increase this:

  • If the receiving end's network buffers run out (source pitches too fast).
  • If the sending end is heavily fragmented, it won't be able to pitch data at line speed.
  • If the receiving end is anything but lightly fragmented, it may not be able to write fast enough.
  • Something on your network path is hogging bandwitdh (some hidden uplink port getting saturated with other traffic).

My back-of-envelope calculation says you need to stream at about 49 MB/Second to make it work. If that hard-drive is naked, and the network stack at all decent, it'll probably be the source fragmentation levels that determines ultimate speed.

Edit: I see from comments that you're planning on a backup-to-disk system.

Some more things to consider. Using multiple target drives in a stripe configuration of some kind is a real good way to parallelize the seek process, and reduce your fragmentation penalty. RAID10 is the best solution for this, though Raid5/6 can work if your RAID card is fast enough to handle it. If it isn't, then RAID10 is your only redundant hope. 7.2K RPM drives really can be used in these situations, I'm doing it right now but with 500GB drives not 2TB ones. You really, really want to ensure that those drives are writing sequentially as much as possible and reduce random writes.

Random writes are caused in several ways. If your backup system just copies files to a new location, you're just creating bajillion files and backups will be unavoidably random in that case. You want to avoid backup systems that do this. If your backup system creates large archive files (10GB files for instance), random I/O happens when those files fragment.

Avoiding big-file fragmentation requires a few steps:

  • Ensure only one file is being written to at any given time.
    • There are some exceptions to this if you're running the right kind of file system on Linux, but I don't know if you are. If using NTFS, keep to one writer.
  • There must be sufficient free-space for one big-file to be written in one chunk.
    • After you've been running for a while, keep an eye on your fragmentation chart.
  • If possible, configure your backup system to create the file in total before usage. You may get some 10GB files that are mostly empty, but at least they're contiguous and will help reduce frag-creep as the system ages.

SysAdmin1138

Posted 2010-06-17T04:08:27.760

Reputation: 5 239

0

If your connection can do 1000 Megabits transferring all that data would take around 4.5h (1 Megabit is 0.125 MB), so this might work, but might, depending on your network layout use a lot of your network bandwidth for that time.

A better alternative for backup, especially if you only want to backup changes and you don't actually produce 2TB of data every 12h is to only transfer actual changes. I suggest you look into rsnapshot which is a nice wrapper around rsync. That way you do the full long transfer only once at the start, and updating the snapshots will be much faster. There are some rsnapshot tutorials on superuser already.

Benjamin Bannier

Posted 2010-06-17T04:08:27.760

Reputation: 13 999

OP said they had GigE, not 100Mbit. – SysAdmin1138 – 2010-06-17T04:24:18.207

@sys: right, fixed that. This changed the tone of the answer though. – Benjamin Bannier – 2010-06-17T04:28:19.210

I would like to run full backups instead of incremental. – CT. – 2010-06-17T04:32:10.763