I have been recently reading up a lot on server side backup software's and strategies.

I am curious to know what strategies and software's seasoned sysadmins (here on ServerFault) use.

  • The do's and dont's of for data backup's and server backup's.
  • What to do when server's actually crash boom bang.
  • Any other sort of information related to backup and restore techniques that you would like to share.

Kindly also post the environment in which you use this strategy (Windows,Linux,etc)

Hoping to learn a lot from this post and to contribute in anyway possible the moment I finalize a backup strategy of my own. ;)

6 Answers6


"Backup and Recovery" O'Reilly book. Highly recommended.


  • That guy also keeps an awesome blog. Also, there are several chapters in The Practice of System and Network Administration that relate. The whole book is great for anyone doing sysadmin work. – mfinni Aug 27 '10 at 18:50
  • I second this recommendation. I'm still learning though. – Bubnoff Feb 16 '11 at 23:01

I have a several rules for me and my team. Hope some of them will be useful for you.

  • All data (except logs & caches) should be backed up. Don't expect the system never crash. It WILL. Sometimes we backup log&cache partitions too, to speed up a system restoring process without making dirs, playing with a permissions etc.
  • Keep a documentation what's backed up and where it's backed up. When you're working with any data, get used to always remember where it's backed up, how often and how to restore it.
  • When you choose a platform, always check the backup solutions for it. Especially how quickly you can restore a system after the crash. Don't choose a platform until you know how to back it up and now to restore it quickly. TRY backup/restore it before installing, an ads always lie.
  • Do a frequent backups only for the frequently changed data. Backing up the whole system hourly is just stupid.
  • Any critical server should have at least one duplicate which can replace the failed server automatically.
  • Make a backup audit. At LEAST once per week. An automated backup systems like to fail, especially fail a couple of days before the day X.
  • Keep all the possible data on the shared storage. This makes backing up much easier. But don't trust your shared storage, make sure you can switch everything to the backup storage quickly, preferable if the system can do this automatically.
  • Use ZFS snapshots or similar technology. One full backup + incrementals, combined with full. If the system requires to make a full backup more than once - it's a BAD system (except a tape of course), we live in 21st century.
  • When you choose a tape solutions, always calculate a price per TB. If it's equal or a little bit cheaper than a regular HDDs - forget the tape. Unless you don't need to restore the data quickly, for the non-urgent archives I would prefer the tape even if it's more expensive.
  • Train yourself. Without a training you'll restore your production much, much longer.

and the final, the main one:

  • Human errors - the most common problem of the data loss. Keep all data in the two copies. Enough separated to avoid killing both with one or two commands. This is a primary reason why RAID is NOT a backup. A significant hardware failure is only on a second or even on a third place.

What we use:

For the servers - we have everything on VMWare VSphere and are almost happy with it's DataRecovery. For Oracle and other databases we use their internal tools. For the workstations - we finally migrated everything to iSCSI or thin clients, so no more slow Acronis and other shit.

We have a mixed environment (70% Linux and 30% Windows). For (mostly) legacy reasons we use EMC Networker (with a tape changer) on the Windows side and bacula on the Linux side. All linux servers are covered through bacula, and the resulting backup directory on that server is then included in the EMC backups (our nightly backups are roughly 3TB in size).

The basic strategy is that for all machines we only cover that part which is not recoverable through standard sources. In other words: data files, databases, configuration files and so on. In some cases, the backup process doesn't have a local client and uses an NFS mount to get access to the stuff that needs backing up (because apart from the NFS mount these target servers change all the time and it's easier to just provide the NFS mount point).

If a server goes completely AWOL (never had that case), we would buy replacement hardware, install the OS and all packages, restore the config files and data and off you go. As said, we never had the case where a server went completely doolally tap. Our backups are mostly used for users accidentally deleting files or files getting corrupted. We have had cases where some build servers had to be restored from scratch because some engineers got them into such a state that normal recovery was impossible, and the principle worked just fine (apart from the fact that restoring 30GB of data just takes some time). I probably should add that all of our mission-critical servers run on RAID arrays and redundant power supplies, and that we usually keep a fair few pieces of spare hardware, too.

Our backup solution is probably not best practice but it worked pretty well. The environment is mixed Windows (80%) and Linux (20%). We used to use tape backups for our Database servers and Source Control Repositories but abandoned that idea fairly recently (a decision made above my head!)

We use StorageCraft ShadowProtect Server edition on our Windows servers with varying policies depending on the criticality of the service (Exchange for example was backed up every half an hour). It creates base images of the system with minimal impact on performance (though on heavy load Database servers we did see a number of problems - mostly the machine ground to a halt due to disk I/O maxing out). It worked very well and gave us the option of Hardware Independent Restore meaning we didn't need to worry too much about which vendor we used to replace the hardware (we have servers from IBM, HP, Dell and Custom build using Tyan barebones).

Linux servers were a different matter, we primarily used custom scripts written by the Senior Systems Engineer. The basic principle was to backup important data and not worry about the OS too much.

We have a 40 TB HP EVA StorageWorks SAN presented to our file servers and mail servers which provided an extra level of protection. Our backup servers were custom built with 24TB of storage using RAID 5. We use SyncBack Pro to make nightly backups of project file shares and any other file level backups that were necessary. Once on the primary backup server the data was SCP'd to the offsite server.

We also ensure we have support contracts for most of our hardware. 24 hour fix for Desktops, 8 hours for Servers that we have from Dell and HP which makes life a lot simpler.

Principles I try to apply to backup:

  1. The files which comprise the backup should be identical to those being backed up, for ease of restoration. Compression and encryption, if required, should be handled at the filesystem level.
  2. Backups must be automated and nightly, and you should get an e-mail from the process when it completes, stating whether it succeeded or failed and how full the backup media was
  3. Backups should be kept at a geographically remote location to the data they back up
  4. Databases and other systems which can't sensibly be backed up by taking a copy of their files should be dumped regularly and the dumps backed up instead.

As regards software, I've found rdiff-backup is a good solution to allow me to get at the last 30 days' worth of backups. I run a simple wrapper script round it on a nightly basis which backs up all my linux servers to the backup server, where the backups live on an encrypted LVM partition. BackupNinja runs on all the servers and takes care of dumping databases etc. just before the nightly backups run.

Back up for a second. There are three 'reasons' to back up your data.

1) Disaster Recovery
This protects you from 'a meteor struck your building' scenarios. You need some way of quickly getting your whole servers rebuilt quickly. The classic answer to this question is full system backups. The problem is after some number of days, a large portion of your data is nearly worthless for DR (the OS data, lots of application data that's hugely static, etc).

2) User Error.
This type of backup covers the 'uh, I blew this file away 2 months ago, and it's really important', or 'uh, our DBA dropped this table, but forgot about this monthly report that we need to run one last time' etc. How long you keep these backups is a business decision. I've heard everything from 1 month to 2 years.

3) Archival.
This is the REALLY long term backups, often required by government agencies... 'The IRS requires this class of financial records for 7 or 14 years'. The good news is this is usually a small subset of your data. Tapes are good for this, or often optical media.

Armed with these data classes (and a good audit of your environment), you can start classifying what type of data you actually need.

Here's our backup strategy (note: it's a little complicated). General strategy: Backup to disk, duplicate some data to tape. We run full backups once per month, level 2 backups one per week, and level 3 backups once per day. We keep the full backups for 3 months on disk, and 1 year on tape. We keep the L2 backups for 4 weeks, and the L3 backups for 2 weeks. This gives us high backup 'resolution' for the past 2 weeks, and diminishing resolution, the further back in time you need. On our user shares (netapp), we don't do L3 backups, instead we rely on snapshots. This makes restores a LOT easier to manage.

The big win we have though is that we have 3 'sites'. One of them is the primary site, and our backup environment (disks, media servers, tape robots, etc) live at one of the secondary sites. This is our big protection from the 'datacenter gone' type problems.

