5

I currently have ~12TB of data for a full disk to tape (LTO3) backup. Needless to say, it's now requiring over 16 tapes so I'm looking at other solutions. Here is what I've come up with. I'd like to hear the community's thoughts.

  • Server for Disk-to-Disk
  • BackupExec 2010 Using De-duplication Technology
  • 20+TB worth of SATA drives
  • LTO5 robotic library connected via SAS
  • 1Gbps NIC connected to network

What I envision is doing a full backup of my entire network which will initially take a long time over the 1Gbps NIC but once the de-duplication kicks in backups should be quick. I will then use the LTO5 to make disk to tape backups and archive those accordingly.

What does everyone think? Any faster way of doing the initial full backup over the 1Gbps NIC? What will be my pain points? Is there a better way of doing what I'm trying to achieve?

Michael
  • 506
  • 2
  • 8
  • 19
  • 1
    Can you describe your data silos a bit? Are these databases? Exchange? binary "userland" files? Distributed across how many servers? Any virtualization? – gravyface Jun 21 '11 at 22:43

3 Answers3

4

I'm currently doing nightly backup of my datasystems, using mostly rsync and rsnapshot for some more 'user visible' volumes.

The biggest volume has a capacity of 16TB, currently 9.5TB used. It first does a simple rsync to a separate disk array. This takes between typically 30-45minutes.

Then, it does a second copy to an offsite server over a 100Mbit wireless link (althought we get typically 50-60mbit effective after some packet loss). This takes roughly 3 hours each night.

So, yes; I think disk-to-disk backup of big volumes isn't a hard thing to do. You don't even need some fancy buzzword-compliant software, simple tools are quite capable.

Javier
  • 9,078
  • 2
  • 23
  • 24
  • problem with the fancy buzzword-compliant software i noticed is that it won't be able to copy files that are in use: exchange, mssql, etc. – Michael Jun 05 '11 at 03:04
  • most of my data-intensive apps are quite backup-friendly; they don't modify files inplace. for other volumes, i use LVM snapshots: make snapshot, rsync to backup, destroy snapshot. on windows i think the functionality is called shadow volumes. there's even an API to ask applications to flush data while the shadow is created. once it's on, the application can continue while you copy from the shadow. – Javier Jun 05 '11 at 05:14
  • good suggestion. last issue, I need something that comes with some sort of support contract just in case i get hit by a bus :) – Michael Jun 05 '11 at 05:23
  • 3
    Good documentation is a better defence than a support contract, if you live in a town served by bad bus drivers. – Bryan Jun 21 '11 at 21:29
1

Of primary interest here is whether you're looking to do backups, or just to maintain an active copy. A single active copy of 16tb updated nightly is certainly a doable thing disk-to-disk, and it'll almost certainly be cheaper than a tape library; that said, consider that your last-resort restore option is now being stored on physically-collocated spinning disk that's vulnerable to all the usual issues of drive failure, corruption on power loss, etc - so design your disk system with an appropriate level of redundancy.

The way we've been doing it, on about 350tb of data, is a simple sync to relatively high performance front end disk, which is then migrated to tapes via robotic library for offsite storage. This gives us fast backup and fast restore for recent (active) data, but ensures reliable tape offsite storage in case of disaster.

Don't be taken in by aggressive sales claims about dedupe in backup - you'll just end up paying in cpu cycles to process the dedupe rather than paying in disk, your restore times will probably suffer since you're now bound on the dedupe system to tell you where your blocks are before you can restore them, and (my personal nightmare) if the dedupe system encounters a data-loss error condition, your last-resort backups are hosed.

These are of course only my own opinions; I hope they're useful to you in designing a backup solution. Best of luck!

Jeff Albert
  • 1,967
  • 9
  • 14
  • I never understood the whole de-dup craze. If your data set has that much duplication, then you need to fix your applications to stop storing so much duplicate data in the first place, not your backup system. – psusi Jun 22 '11 at 13:34
  • What are you using to sync the 350tb to the high perf. front end disk? Can you give me your backup plan from 350tb to disk and then offsite tape? – Michael Jun 23 '11 at 06:24
  • We backup our many systems via TSM into a random disk pool on high-performance SAS disk, then migrate that data onto sequential disk storage volumes (essentially big fixed-size files) on the somewhat slower SATA disk, so we can stream-read that data for transfer onto tape for offsiting. Since we're only syncing deltas from the clients (TSM is incremental-forever), we can keep the random pool relatively small and use it as a buffer to chunk data into the larger sequential volumes which constitute our main active backup store. – Jeff Albert Jun 23 '11 at 20:54
0

If you are using a filesystem that has a dump program, such as ext[234], then you could get an eSATA dock and a bunch of cheap 1TB sata disks. For the initial level zero dump, you will need a dozen drives that you can then toss into the fireproof safe or safe deposit box, then rotate through another 5 or 6 drives doing daily tower of hanoi pattern backups. Using this method you will usually have 2 or 3 copies of frequently changing files on the daily drives in case you need to retrieve a file that has been deleted or overwritten, and if you have to do a complete restore, you go fetch the dozen level 0 drives, then restore between 1 and 5 of the daily drives, depending on what day the system crashed.

For more information on the tower of hanoi backup pattern, see http://en.wikipedia.org/wiki/Backup_rotation_scheme.

psusi
  • 3,247
  • 1
  • 16
  • 9