What's needed for a complete backup system?

Question

For my new server, I want to setup a proper backup solution. I've found a great setup that will do twice-daily incremental backups via Dropbox. I plan on backing up my various databases, the webroot directory, the /etc directory/repository, and /var/log.

What else do I need to know to do a proper backup, and what is the standard setup here to ensure you can quickly restore from a backup in the case of a system failure?

I'm thinking of using Puppet, as it describes how the system should be. My restore procedure would look like this:

Install Puppet
Run my puppet config
Restore my backups from Dropbox (Should I create a script to do this? Probably)

This should also let me create a clone of my production server for use in dev environments, correct? Am I missing anything of importance?

I doubt very much that a "proper backup solution" would involve Dropbox in any way. — Michael Hampton, Feb 06 '13 at 00:40
@MichaelHampton Any reason why? Dropbox seems like a reliable/affordable backup system for a small business. Having Dropbox sync to a second machine ensures you have a backup even if the server & dropbox fail at the same time. — Brandon Wamboldt, Feb 06 '13 at 00:49
And when you accidentally delete a file and that deletion syncs across dropbox and the second server? — MDMarra, Feb 06 '13 at 00:54
[MIRRORING IS NOT BACKUP](http://www.acronis.eu/resource/tips-tricks/2004/mirroring-is-not-backup.html) — MikeyB, Feb 06 '13 at 00:56
Dropbox has limited, situational sysadmin value; backing up your infrastructure isn't really part of it. S3 buckets are far more bang for the buck, for a tiny bit higher barrier of entry. — Stephan, Feb 06 '13 at 01:02
@MDMarra Dropbox supports versioning as well. What if the second machine that dropbox synced to backed up to a physical medium? — Brandon Wamboldt, Feb 06 '13 at 01:02
Stop what you're doing, go directly to Amazon, buy [_The Practice of System and Network Administration, Second Edition_](http://www.amazon.com/Practice-System-Network-Administration-Edition/dp/0321492668/) and read Chapter 26. (And the rest of the book...) — Michael Hampton, Feb 06 '13 at 01:04
[Dropbox Will Eat Your Files, mmkay](http://konklone.com/post/dropbox-bug-can-permanently-lose-your-files) — Tom O'Connor, Feb 06 '13 at 09:13

score 22 · Accepted Answer · answered Feb 06 '13 at 01:32

We build backup systems for one purpose: To enable restores. Nobody cares about backups; they care about restores.

There are three reasons one might need to restore file(s): Accidental file deletion, hardware failure, or archival/legal reasons. A "complete" backup system would enable you to restore files in all of these scenarios.

For accidental file deletion, things like Dropbox and RAID fail because they simply reflect all changes made to the filesystem, and a deleted file is gone in these scenarios. Your backup system should be able to restore a file to a recent point in time fairly quickly; preferably the restore would complete within seconds to minutes.

For hardware failure, you should use solutions such as RAID and other high-availability approaches when possible to ensure that your service remains up and running, as a full restore of a system can take hours or possibly days due to the necessity of reading and writing to (relatively) slow media.

Finally archives, or full backups (or equivalent) of the systems at a specific point in time, can serve restores in both legal and disaster recovery scenarios. These would typically be stored off-site, in case a stray meteor turns your data center into a smoking crater...

Your complete backup system should be able to support restores for any of these three types, with varying levels of service (SLA). For instance, you may decide that a deleted file may be restored with one business day granularity for the last six months and one month granularity for the last three years; and that a disk failure should be capable of being restored within four hours with no more than two business days of data loss. The backup system must be able to implement the SLA in a backup schedule.

Your backup system must be fully automated. This cannot be stressed enough. If the backups aren't fully automated, they simply won't happen. Your backup system must be capable of fully automated backups, out of the box, with little or no special configuration or scripting required.

You must periodically test restores. Any backup system is utterly useless if restoring from backup fails to work. I think most of us have horror stories along these lines. Your backup system must be able to restore single files or whole systems within the SLA you're implementing.

You must purchase backup media on an ongoing basis. Whether you're just doing on-site tape backup or going whole hog with off-site cloud backup, make sure you have it in the budget to pay for the gigabytes (or terabytes!) of space you will need.

This has been a very brief summary of a portion of Chapter 26 of The Practice of System and Network Administration, Second Edition, which anyone who is or aspires to be a system administrator should own, read, and memorize.

I've glossed over a lot of things that don't necessarily apply to your particular situation or that don't make sense in a small environment such as the one you've described. Nevertheless it should be a reasonable description of the features that your "complete" backup system should have, as well as why they're necessary.

So RAID-1 is an example of a backup system for hardware failure, correct? It's an entirely separate drive that is an exact copy of the primary drive. What is an example of a backup system that would allow users to restore a file to the state it was in for any day in the last 6 months? Would a daily backup via Git be a viable solution? These backups would have to be stored locally, so how should large files be handled? Thanks — Brandon Wamboldt, Feb 06 '13 at 01:37
@RogueCoder No, RAID 1 is an example of a fault tolerant system. RAID 1, or mirroring, in any form, is never a backup. Backup implies history retention as well as protection away from the system itself. — Wesley, Feb 06 '13 at 01:40
@RogueCoder https://en.wikipedia.org/wiki/List_of_backup_software should make for a good starting point. — Michael Hampton, Feb 06 '13 at 01:55

Chris S · Answer 2 · 2014-07-05T17:16:51.937

DropBox would be a risky way of doing backups. No SLA/QoS, and it's probably against their normal TOS to dump that much data to their servers in an automated fashion. They specifically disclaim any liability in accessing your data - they may cut off access, destroy data, or go bankrupt at their own discretion and without warning.
No backup procedure is "valid" until you've actually restored from it, it's the only way to be sure. Many most backup software provides a "validate" feature, this is worse than useless for most people as it only validates that something was written to a backup medium, not that the something is actually useful in restoring an operational system.
Relentlessly-complete documentation ensures you'll be able to follow the restore procedures when disaster does strike - testing documentation should be a part of testing the restoration of your system. Also, that someone else will be able to complete the procedures should you get hit be a bus (Murphy's Law and all that).
Restoration is only useful if it can be accomplished in a meaningful time period. Eg, If it took a year to restore your data that would be useless. You should determine what time frames are necessary for your situation for three levels of functionality: minimal functionality, daily operations, complete. Test your proposed solution, see if fits the time requirements.

I think #3 often gets less attention than it deserves. *You* might be able to wing it setting up a prod environment from scratch on bare metal and restoring all of the data to the correct places, but can *someone else* do the same thing? Taking the time to create *and maintain* documentation in some form (could be a Word document, or a bunch of machine templates and scripts, or whatever), which is detailed enough to be followed through by anyone at least remotely competent, will pay huge dividends the day that disaster strikes. And of course, it has to be useful without anything else in place. — user, Jul 20 '17 at 14:47

score 3 · Answer 3 · answered Feb 06 '13 at 00:41

3

Naw. You're good for now. At least with the concepts...

Think about the state of your system at the time of your backups. Perhaps you don't want to backup a live database...
Or think about your hardware. Are you doing everything you can to make the machine as resilient as possible? For instance, I want restoring from a backup to be the LAST thing I have to do in an emergency situation.
Outages and small service outages can be reduced by using quality hardware, so make sure you're using RAID, server-class equipment, and looking at a more local approach to data protection.
Think about the types of failures and situations you're protecting against.
I wouldn't necessarily use DropBox, but the idea of offsite protection is correct.

answered Feb 06 '13 at 00:41

ewwhite

194,921
91
434
799

Any reason why? Dropbox seems like a reliable/affordable backup system for a small business. Having Dropbox sync to a second machine ensures you have a backup even if the server & dropbox fail at the same time. – Brandon Wamboldt Feb 06 '13 at 00:51
1

Sure, dual synchronisation to two boxes via the Dropbox service would work, but doesn't cover... other... failure modes of the Dropbox service. Most notably, the synchronisation of the boxes doesn't avoid a kill command to delete all the files being sent to both boxes. Fundamentally the same issue as you are dealing with configuring a set of spindles in a RAID array; Dropbox is basically just a network-extended RAID service possibly with some rudimentary snapshotting. (See [RobM's post](http://serverfault.com/a/476004/131019)) – Cosmic Ossifrage Jul 14 '14 at 13:14

score 3 · Answer 4 · answered Feb 06 '13 at 00:42

My preferred, tried and true backup system is:

Hourly snapshots of all databases (and one snapshot archived per day for two weeks, one snapshot archived per week for a year.)
Disposable servers. That is, all server standups are stored in git and deployed automatically (very similar to what you're saying with puppet, our preferred tool is chef though.) Essentially, a new server can be stood up from scratch using only the code you have in git, meaning any development hosts are built in similar fashion as your production servers.

The puppetmaster or chef server in these cases can be a potential point of failure; again, automate rebuilding them as much as possible, and have scripts on hand to allow existing nodes to bootstrap to a new server management host as quickly as possible, in the event that the old box is knocked over. I've found it can sometimes take significantly longer to rebuild this sort of host from a backup, than to stand a new one up from scratch (and restoring from backups can unintentionally reintroduce the same flaws or issues that caused it to go down in the first place.)

On a different vein, if you have more than a couple of servers, hosts, etc, it's well worth the investment to use a central log server. If they're housed (and backed up) from one source, it saves you the headache of having logs on the rest of your hosts piling up and taking space. Log data is gold, but if I have 20 api servers all serving up traffic, and I get hit with something like a DDoS, not having aggregation of my logs means I'm looking for a needle in a haystack. If you're going to store your infrastructure logs (and you should!) then store them once, on one robust backup platform.

G'luck~!

score 3 · Answer 5 · answered Feb 06 '13 at 11:44

RAID, & services like dropbox "back up" all your changes. Including the mistakes you'd want to recover from by using a backup.

This is why all us sysadmin types are getting very very antsy about why things like RAID or toytown cloud file storage services that rely on mirroring changes to your files as they happen are not backups. That's not to say these services are not useful. They are, but they're not backups because they don't really give you data integrity.

A backup should be a snapshot of how things were at the time the backup was taken, not a continually over-written live log of all the good and bad things that happen to your data as it happens. There are cloud providers that will give you actual backup out there if you look, and they work differently to dropbox/skydrive type services.

When it comes down to it, it's your choice what kinds of risk you're willing to expose yourself to vs. your budget for mitigating those risks. If you think that something like Dropbox is good enough then that's up to you. But you need to be clear about what it will and will not do for you - please don't kid yourself that it's a real backup.

What's needed for a complete backup system?

5 Answers5

Linked

Related