Testing software RAID5 by pulling a HDD live. Bad idea?

3

1

I'm running Ubuntu Server 10.10, with software RAID5 and LVM running on top of that. I have four 2TB HDDs. I also have a separate boot drive with incremental full-system snapshots that run every four hours, and every day, week, month, and year. So even if the RAID array totally fails me, my data won't be lost, but it'll be difficult to re-setup my entire system.

I tested my software RAID5 the other day by pulling a HDD while the system was off, booting degraded, putting back the HDD, rebooting, and then rebuilding after telling mdadm to re-add the re-inserted drive. After 10 hours of rebuilding, it succeeded, and now the software RAID5 device is clean again!

Now, I want to do another test. Would you say it's dangerous to pull one of the HDDs while the machine is running and possibly reading/writing? I'd pull the SATA cable, not the power cable, so that I don't risk damaging the hard drive, but would I have data loss? I have very redundant backups on other HDDs so there's no risk for permanant data loss, but I don't want to have to reinstall and reconfigure my entire system.

Is it unsafe to test? What's the worst that can happen?


Update:

I'm the original poster. Because I made this question without an account, and because it was moved to a different StackExchange site, I lost ownership of the question.

So I did the test. I pulled the SATA cable of one of the four hard drives, and the system performed beautifully.

Here's a screenshot of the Webmin RAID interface:

Webmin

And here's a screenshot of an email I received from my server:

Received email

So to answer my own question, it was safe to do so in my particular case.

Drew Gottlieb

Posted 2012-08-07T17:23:24.173

Reputation:

Good luck! Those aren't hot-swap drives, are they? – ewwhite – 2012-08-07T17:28:27.657

The mother board isn't hot-swap. The drives are just normal Hitachi 7,600rpm drives. – None – 2012-08-07T17:29:59.130

1Actually, you should make sure that the system is writing to the raid. And the other test is to drop power for the whole system while it's writing. It would be a good idea to have a backup system ready, so others can continue to work while you are doing the recovery. – ott-- – 2012-08-07T19:01:36.773

@drew headover to [sf] & register your account via the same OpenID that you used on your [su] account. Once you've done that, you should should be able to gain control over your question – Sathyajith Bhat – 2012-08-08T05:06:29.473

Answers

2

Actually testing RAID array is generally a bad idea. There are of course exceptions (if you don't know for certain that your case is an exception, then it isn't), but physically disconnecting drives that aren't "hot swap" is always a bad idea, and doing so purposefully is even worse.

Chris S

Posted 2012-08-07T17:23:24.173

Reputation: 5 907

I believe all SATA drives are hot-swappable by design (I just proved myself in Wikipedia) – Alex – 2012-08-07T17:38:09.717

Bad idea ESPECIALY if they aren't hot swappable. – Chad Harrison – 2012-08-07T17:38:44.337

@alex You also have to consider the RAID setup. Try hot swapping a RAID 0 (really, don't). – Chad Harrison – 2012-08-07T17:40:19.203

1@Alex, yes assuming they followed the spec... which doesn't happen nearly as often as it should. – Chris S – 2012-08-07T17:40:33.493

1

You've already tested that the RAID software does what it's supposed to. If you pull the SATA cable on a live disk, while it's not supposed to do any harm, it is possible you could still damage the disk's electronics. I'm assuming the drives are not hot-swap since you're talking about removing the cable. If it were hot-swap, you'd just unclip the drive and slide it out.

So the "worst that could happen" is that you either kill a drive or you kill the on-board disk controller. Neither of which is a good result.

I would recommend not doing the live test.

StarNamer

Posted 2012-08-07T17:23:24.173

Reputation: 915