What is a good schedule / methodology for test-restoring my backups?

Question

It's common knowledge on SF that your backups are only as useful as your ability to restore them. So you've revised and documented your backup schedule. You check the logs and/or receive notifications of the results for each backup job.

Now you want to make sure you're never caught with your LTO-pants down, and you're going to do spot-restores from time to time. I realize this will vary DRASTICALLY based on the size & types of data, but I'd like to find out how people are tackling this. Or is primarily just a training issue - making sure that you (or your people) have experience in each necessary restore?

We have lots of questions on how to restore a particular technology type. I'm more interested in how you satisfy yourself that a quick recovery is certain.

score 3 · Answer 1 · answered Jul 17 '10 at 13:37

Depending on the environment, this could be tricky.

In a dedicated-server environment, it would be most helpful to have a backup machine whose hardware is identical to the primary. Take the backup machine offline, then run the restores on it while it's isolated. Once you're satisfied that it restored properly, ensure that the whole process was properly documented. If you're really serious, have someone else try to follow your instructions with no outside help.

The acid test is in swapping the backed-up machine in for the primary. This won't work, of course, if you work with live data which is constantly changing. It's also risky and not entirely necessary, but it guarantees you will be able to recover with your (newly) established procedures.

Since I don't have a backup machine, and I can afford to off-line my server for hours, I do the following:

Run a fresh backup
Replace all of the relevant hard drives with spares. Label and keep the originals in case things go badly.
Run the restore procedure, documenting along the way.
In this case, just return the system to service. The original HDDs go on the shelf as spares, and a really last-ditch recovery option.

I won't attempt to comment on VM environments. I know only enough about them to be dangerous.

Be sure to review and test your backup and restore system whenever you make significant changes to your server. It wouldn't do to have a problem, dust off the 5-year-old binder with the recovery instructions, only to realize:

it's for hardware you no longer have,
software which has undergone drastic changes three times,
and doesn't even talk about the four new roles you've added to this server since the last time the book was updated.

In closing, the key is to carefully document the entire process. Write it out in such a way that the new hire that just started last week fresh out of school can successfully get things going again with no outside help. Then test it.

Good luck!

Thanks for the detail! I'd realized we always say "test your backups" but couldn't find any mention of how often people actually test. — Kara Marfia, Jul 17 '10 at 21:35

score 0 · Answer 2 · answered Jul 27 '10 at 04:41

Perhaps I'm superstitious but I really don't like managed backup software such as Backup Exec. It requires its own database, which also must be backed up, and its own software, which also must be installed somewhere in order to restore. So if you lose the backup server, you either restore it from an image or rebuild it, then restore the database, before you can restore anything else.

I feel much safer with the data + image scenario. A complete OS image is easy to do in Linux and OS X. In Windows I have found Acronis Universal Restore has worked great in taking a full Windows image offline which I do quarterly. I then restore that image to another server offline. Then, it's a matter of restoring the data files, which doesn't need Backup Exec if you can run something like rdiff-backup to an external storage server.

Knowing I have good images and easy to access files, not hidden in a container and managed by a database, and having tested the restore process on a spare server I have around just for the purpose, helps me feel pretty confident about the process.

What is a good schedule / methodology for test-restoring my backups?

2 Answers2