0

I have been running windows software RAID 1 (mirror) on Windows 2008 R2 for a while and recently I experienced something very odd, hope someone can help explaining and share your view.

Yesterday (Mar 14) my Windows 2008 R2 server just crashed and rebooted, and before it entered windows it said "bad sector" detected and it had to do some mapping/marking on the drive. After 5-10 min, it started Windows and the mirror drive was offline. Okay, but when I looked at the Event Log and the file system, I was shocked to see all the data was gone from Jan 29 - Mar 13. There wasn't ANYTHING in the event log for that period, it was so wierd. In the file folders, I could see data only prior to Jan 29, not anything after. Luckily I have Backupexec to restore the data.

It seemed like the last "good recovery point" was in Jan 29... how does this work?? why would I lost all the data from that date and on? I assumed if mirror drive failed, the 1st drive should have captured all the data up till yesterday. I am confused.

Please Help..

R.B

userb00
  • 103
  • 2

1 Answers1

2

If one drive had errors, your errors can get mirrored to the other drive. It sounds like it repaired the filesystem and in the process deleted a @#% amount of data.

RAID will faithfully mirror errors. That's why it isn't a backup, as you apparently already knew. The filesystem attempts to stay consistent and usable, it does NOT guarantee data protection.

In summary, you had an error, the filesystem was made consistent, it deleted a lot of data in the process, and the RAID mirroring faithfully mirrored the now-consistent data.

Bart Silverstrim
  • 31,092
  • 9
  • 65
  • 87
  • Just curious... If the same thing happens to Raid 5, say there are bad sectors on one of the drives, what exactly will happen? Will Raid 5 be my ultimate solution? Thanks a lot for your comments. – userb00 Mar 15 '11 at 14:35
  • Depends on the size of the array; RAID 5 on large-capacity drives is becoming something kind of cringe-worthy due to silent drive errors. I was personally bitten by that one (drive failed, replaced it, during the rebuild a second drive was found to have an unrecoverable error that wasn't detected so it refused to finish the rebuild despite trying to repair it) – Bart Silverstrim Mar 15 '11 at 16:47
  • Larger drives have a certain capacity for failures, and the tolerances are changing as densities and data capacity increase. RAID 5 can get you into a point where the rebuild of a volume simply fails and you won't know it until you have a drive totally fail. – Bart Silverstrim Mar 15 '11 at 16:48
  • And yes, if there's data corruption in the *filesystem* level, it will spread to the other drives. RAID protects against HARDWARE failure. NOT filesystem errors. It's supposed to keep your server running if your hard disk flakes out. The filesystem itself, and your data, reside at a level above that level; the RAID is data- and filesystem-agnostic, as long as the operating system sees the controller. – Bart Silverstrim Mar 15 '11 at 16:49
  • 1
    Think of it this way...if you run RAID 5 or RAID 10 or RAID 1, and you delete a file that you suddenly remember you wanted, is it on the drive volume even though it's RAID'ed? No. No more than any other disk with standard data recovery techniques. You have to fall back on your backups to get the file back. RAID actually can complicate data recovery at the disk level since you can't necessarily easily examine just one drive to get the data, 'specially when it's spread in stripes across drives and/or uses special controller-specific techniques for formatting data. – Bart Silverstrim Mar 15 '11 at 16:52