This might sound stupid, but I'm supposed to ask any question here, as long as it is related.

So here. I have RAID 1 on the server that is officially soon to be dead. After a chkdsk, the files are corrupting on a more regular basis, and things are looking pretty apocalyptic from my terrified pessimistic oh-my-god-we're-all-gonna-die standpoint.

I'm unable to read files on the server, notably everything that was done this morning by a coworker was lost.

Okay, context apart, I want to see if the alternate drive is working any better than the master one, so here are my questions:

How do I know which one is the master, and which one is the slave, how do I determine which one is faulty?

When that is determined, what do I do? Can I just pull the other drive away? I tried doing so (taking appropriate precautions as to not kill any other hardware), but I got a "broken raid detected, enter setup?" message at boot.

Do I need to insert another drive before I can get the data?

Is trying to boot using a live ubuntu disk a good way to try to save some data?

I'm trying my best not to panic, here, but when my boss finally understood what I meant by "We're going to be in deep shit real soon", it was already close to the expected depht, and "soon" was less than a week's time apart from that moment. Oy vey...

UPDATED: I tried the SeaTools, as suggested below, and both drives failed the long generic test. On a scale from one to infinity, how much poop am I exactly into now?

Say I really, REALLY need to have the data back, how much money will my boss have to give away to do so? Is it even possible? I mean, I stopped believing in Santa a while back...

Olivier Tremblay
  • 347
  • 3
  • 16

3 Answers3


You need to remove each hard drive, and test them with a tool like Seatools seperately.


One of the disks is likely going bad, but with RAID1, there is no way for the computer to know for certain which disk has good data on it, and which disk has bad data on it. If you're very unlucky, both disks will be going bad, but more than likely, it's just one of them.

In a RAID 1 configuration, there is no Master/Slave, the two disks become one, and everything is controlled via the Raid Controller. Because of this, you'll likely want to test the drives in a different machine. Also, if the tests come back with one disk bad, all you need to do is remove the old drive, and get a replacement soon. If you have RAID1, then the computer will run with only 1 drive.

  • 1,336
  • 7
  • 12
  • Meaning, roughly, that I remove one drive, not care about the broken raid message, and test the drive. Then, I do the same for the other. The less broken of the two wins the contest? – Olivier Tremblay Aug 10 '09 at 15:38
  • Yes, exactly. You can replace the disk later, but you need to figure out which one is going bad. – IceMage Aug 10 '09 at 15:39
  • One more thing, I wouldn't suggest running chkdsk on RAID computers, it tends to do more damage than it does good... Use Seatools to check the disks for problems, it's not as destructive as chkdsk can be. – IceMage Aug 10 '09 at 15:42
  • In reply to "try not to use the chkdsk tool": Oh crap. -_- – Olivier Tremblay Aug 10 '09 at 15:45

In theory you should be able to just pop out a drive on a mirrored array while the system is running, and it should keep running fine. I actually did this once (by accident I hasten to add) and there were no ill effects (aside from missing one of the mirrored pair, of course).

Very very risky on a live system though, so my recommendation here is to get a replacement in place NOW, and get as much data as possible transferred over to it, with the old server then being taken offline. The worst you can do is wait until it fails before taking action.

Having done that you will have established a position where most of your stuff is up and running and available to users. So then - and only then - you can start experimenting with the old box and seeing how much of the rest you can get back.

My suggestion of popping the drive is one way - pop a drive while running and check the data. If you get corruptions you know that the other one is good, if you don't you know that this one is good.

I would definitely not do anything that involved reboots of that system as the whole thing could fail to come back up at any point. If you've ever been in that position you know how unpleasant it is.

Maximus Minimus
  • 8,937
  • 1
  • 22
  • 36
  • Want more? There is no spare or backup server. There is no replication. I'm taking action because the boss decided that a server fail was critical enough for me to "get fixin'". The server box was in a godforsaken, air-constipated broom closet, less than two feet apart from the building's main gas pipe. The server faulted due to a read error while people were working - and saving - and losing - data on it. Bright side: There's less than 10 employees here. And there might very well be one less by the end of the month. – Olivier Tremblay Aug 10 '09 at 16:34

Is it a hardware RAID controller? Most have a utility that you can use to check the health of your disks and which one may be degraded. The manufacturer will also provide a recommended procedure to replace the degraded drive and to rebuild the mirror.


  • 4,827
  • 4
  • 22
  • 31