Why is RAID not a backup?

Question

When someone mentions RAID in a conversation about backups, invariably someone declares that "RAID is not a backup."

Sure, for striping, that's true. But what's the difference between redundancy and a backup?

score 148 · Accepted Answer · edited Aug 07 '15 at 22:19

148

RAID guards against one kind of hardware failure. There's lots of failure modes that it doesn't guard against.

File corruption
Human error (deleting files by mistake)
Catastrophic damage (someone dumps water onto the server)
Viruses and other malware
Software bugs that wipe out data
Hardware problems that wipe out data or cause hardware damage (controller malfunctions, firmware bugs, voltage spikes, ...)

and more.

edited Aug 07 '15 at 22:19

wzzrd

10,269
2
32
47

answered May 02 '09 at 00:09

Kevin Dente

2,195
1
14
7

2

Will a backup refuse to copy a corrupt file? – jldugger May 02 '09 at 00:53
18

Depends on what "corrupt" means but normally backup applications have a setting for this... however, the second point of backup is to keep different versions of the file through time - not just a single version - thus circumventing the problem with a newly corrupted file overwriting a fresh version... – Oskar Duveborn May 02 '09 at 01:00
3

> Will a backup refuse to copy a corrupt file Yes, if you cannot read the blocks of a corrupt file, you won't be able to make a copy of it (backup) – Dave Cheney May 03 '09 at 03:22
2

But what about silent data corruption; if a data block goes bad, most filesystems won't notice, will they? – jldugger May 03 '09 at 18:07
There are some types of file corruption that the backup might be able to identify and then not backup - but the result is just that you don't have a backup of that file; you then need to go back to a previous backup. You also need an archive so you can step back in time. – Richard Gadsden May 07 '09 at 10:00
11

Reasonable backup strategies include keeping a history, so that you can go back to before the corruption. The most common handling of the possibility of corruption is to pretend it can't happen. But if you want to protect against it, you can attempt to detect it as soon as possible, and in varying chunk sizes (device block level, database page level, file level). If you detect data corruption fast enough, it isn't "silent" data corruption anymore and you have a chance of recovery. – carlito Jun 01 '09 at 20:04

score 126 · Answer 2 · answered May 03 '09 at 21:22

Q: Why is RAID not a backup?

A: Because the whole purpose of a RAID is to make sure that nothing in the world can interrupt that accidental rm -rf / (or DELTREE /X C:\), not even yanking the power chord in panic.

Q: But whats the difference between redundancy and a backup?

A: If you accidentally overwrite your PhD thesis with garbage, redundancy ensures that you have multiple copies of garbage, in case one gets bad. A backup ensures that you can restore your PhD thesis.

(And an archive ensures that you can retrieve multiple older versions of your thesis, and a version control system also tells you why you made a new version in the first place.)

Man, I had to read the first answer three times to actually get its meaning. lol — xiaofeng.li, Feb 07 '21 at 22:55

score 30 · Answer 3 · answered May 02 '09 at 00:06

30

Redundancy protects you against your hardware failing. It does not protect against user error, nor against malicious activity (e.g., crackers getting into your system).

See: Why Mirroring is Not a Backup Solution for a hard-earned lesson.

answered May 02 '09 at 00:06

C. K. Young

1,842
16
16

7

Nor software bugs, which are more common than malicious activity. – jhs May 03 '09 at 14:39
It's an interesting bit of irony that the article linked from that Slashdot page has now disappeared off the web. Not even the Internet Archive provides a meaningful copy; even though they did crawl the page shortly after the Slashdot article date, their copy simply says the page was not found. – user Dec 24 '13 at 15:11
Nor memory errors, which why you need ECC. – inf3rno Jul 08 '17 at 08:47

score 22 · Answer 4 · edited Oct 02 '17 at 20:58

22

The number one reason you want a backup is not because the physical media died (this is rare), but because of some error that caused the data to be lost or corrupted.

RAID doesn't protect you against a file being deleted.

RAID doesn't protect you against a file being overwritten.

RAID doesn't protect you from your system being compromised and all of your data being overwritten, deleted, or corrupted.

RAID doesn't protect you from your ops team accidentally paving a machine with important data on it.

RAID doesn't protect you from a foolish DBA running a drop command on the production server (mistaking it for a test environment).

RAID doesn't protect you if the building burns down.

P.S. http://ma.gnolia.com/. This is what can happen if you don't have good backups. Your site is snuffed out of existence (note: this tends to be bad for business).

edited Oct 02 '17 at 20:58

Community

1

answered May 02 '09 at 01:36

Wedge

1,597
11
16

1

So you need to build another building just for the backups. Trolololo. :D – inf3rno Jul 08 '17 at 08:49
1

@inf3rno it turns out that others have already built many other buildings. – Wedge Aug 25 '17 at 22:24
2

I don't think `http://ma.gnolia.com/` is quite what you meant to link to... – user Oct 01 '17 at 13:51

Chris Upchurch · Answer 5 · 2009-05-07T19:20:08.583

13

Redundancy is great if one of your disks fails. It's no so great if your computer gets a virus, or you mistakenly delete a file, or you need to restore the disk to a previous version for some other reason. That's when you need a backup.

RAID helps you recover from failures, but backups let you go back in time.

edited May 07 '09 at 19:20

answered May 02 '09 at 00:06

Chris Upchurch

619
4
9

score 10 · Answer 6 · answered May 02 '09 at 00:41

10

It should also be mentioned that a hardware fault in the raid controller can easily corrupt the data on all attached disks. So while you reduce the danger from disk failures you add the danger of raid controller failures.

answered May 02 '09 at 00:41

sth

250
3
15

score 7 · Answer 7 · answered May 07 '09 at 19:54

Multiple rotating copies
Geographic redundancy

Asked in a comment to the accepted question:

Will a backup refuse to copy a corrupt file?

Even if a backup copies corrupt or bad data, the point of a backup is that you can and should have multiple copies. For instance, last hour, yesterday, last week, etc. You can get a similar effect from using rotating snapshots on your storage device.

But the other reason for backups is geographic redundancy. You should certainly keep copies of critical data in two different geographic locations. How separate those locations are depends on how critical the data is; keeping copies in two different buildings in the same city protects against fire or theft. Keeping copies in two different countries protects against bigger problems.

Great answer but I would really like to dig deeper the "bigger" problem :) What exactly are clasified as the problems — Teo Carter, Jul 05 '18 at 12:29

score 6 · Answer 8 · answered May 02 '09 at 00:09

RAID can be a great way to mitigate risks due to hardware failures, but RAID won't help you when your users delete (accidentally or otherwise) their data. To recover data you need some archival facilities, either through local snapshots or online/offline backups.

score 3 · Answer 9 · answered May 02 '09 at 01:36

3

In a RAID5 array, consisting of disks over 400Gb, if you lose a disk there's something like a 75% chance of having an unrecoverable read error while the array is being rebuilt. Think about that for a second and it becomes pretty obvious why someone will always remind you that "RAID is not a backup".

RAID gives you higher reliability and performance, but it's not infallible.

answered May 02 '09 at 01:36

saschabeaumont

2,794
22
14

4

Real problem, bad math. – Paweł Brodacki Sep 11 '11 at 16:12

score 2 · Answer 10 · answered May 02 '09 at 04:09

2

Fire, theft, RAID controller fault, human error, the list goes on

answered May 02 '09 at 04:09

DDM

240
2
12

score 2 · Answer 11 · answered May 12 '09 at 13:21

What's the difference between redundancy and backup? Ok, configure a RAID 5 disk set. Store some business-critical stuff on it. Pull a disk out. Everything still works! That's redundancy. Now delete all the data (don't cheat with the recycle bin). Now restore it from the most recent backup. You don't have one? Oops. Well at least you can tell your boss your disks are using RAID 5 redundancy (as you get marched out of the building...)

score 1 · Answer 12 · answered Dec 12 '21 at 12:47

1

RAID helps you to eliminate downtime in case of limited, but most probable scenarios, of HDD failure scenarios. Usually it's one drive failure at a time.

RAID does not protect you from having stored invalid data on drives. Application or system software bug causing wipe of some or all data from drives, or human mistake deleting wrong data, or malicious users, or viruses. In such scenarios, RAID ensures, that data loss happened also on redundant drives.

RAID does not protect you from lossing whole array at the same time. Fires, floods, or other catastrophes destroy it all at once. Similarly thiefs can stole whole NAS at once, or very drunk roommate in a very bad mood can play "throw it as far away as possible" with NAS.

Backups help you get back in time. Restore what was once stored as current/live data.

Backups help you to restore previous versions of lost data in case of catastrophic failure.

Mirroring of data helps you to protect from catastrophic loss of single physical location, but doesn't necessarily prevent hackers or viruses or other means of data loss, or corruption, propagation to mirrors.

answered Dec 12 '21 at 12:47

kravemir

151
4

Actually I am very disappointed the correct explanation comes here 12 years after the question was asked. The elimination of downtime is the only real purpose of the RAID. – Nikita Kipriyanov Dec 12 '21 at 13:55
It's not truly only real purpose of RAID. It also prevents short term data loss, changes since last backup, in case of single drive failure (or more, depending on RAID level). Similar can be achieved with very often incremental backups, or some audit log, or mirroring or by other means, but, RAID does that job better. However, the point is similar, that one can't rely on RAID as a way to ensure, that data will be kept intact and available in future, as a system with RAID itself is the single point of failure. – kravemir Dec 13 '21 at 16:37
1

If you need to have guaranteed data recovery at lesser interval than you do backups, the last technology you'll be relying upon is RAID, right? You'll use application-level replication, filesystem-level snapshots, block-level snapshots, etc. RAID doesn't enable data recovery per se, it only enhances all those technologies to be able to survive HW failure. All it does is to be less affected by HW failure. Any of those technologies will do its work in absence of RAID (except for HW failure case), but RAID alone won't allow any such recovery possibility. – Nikita Kipriyanov Dec 14 '21 at 06:52
You're right. Software architecture designed for strong data retention guarantee, would also include data replication on multiple different devices, therefore also covers failure scenarios that are handled by RAID. So, RAID is not strictly necessary then, but a very useful convenience. – kravemir Dec 14 '21 at 13:46

score 1 · Answer 13 · answered May 07 '09 at 19:43

1

Also consider with raid that you have multiple hard drives probably build at the same time and then exposed to the same conditions for years .... what are the chances that they will all fail about the same time .... pretty high

answered May 07 '09 at 19:43

trent

3,094
18
17

3

MTBF != expected lifespan of gear – Tetsujin no Oni May 07 '09 at 20:38
This isn't really an issue with *RAID,* though. Well, the "same use patterns" might be exacerbated by RAID, but multiple drives exposed to the same conditions isn't a function of RAID. – user Dec 24 '13 at 15:17

score 0 · Answer 14 · edited Jul 08 '20 at 23:15

[Not an answer, I already know. But an instructive tale nevertheless. Feel free to not upvote it. I'm posting it as an answer simply because it's too long for a comment.]

"We have redundant web servers on a load balancer, redundant database servers in a cluster and redundant hard drives in every server. So how did this happen? According to our server company there was a manufacturers bug in the firmware of the specific model that 6 of our 8 hard drives were on. That bug caused the disks to die after a certain number of hours running."

https://tvtropes.org

https://tvtropes.org/pmwiki/posts.php?discussion=15941624520A37147500

Why is RAID not a backup?

14 Answers14

Linked

Related