-7

We have a Dell R510 server running SQL 2008 R2 with 8 x 300GB drives running Raid 5.

We (just noticed) we had three bad drives with blinking lights so we powered down the server and replaced them with new ones.

When the server came back up the lights were green (but not flashing).

The server only shows XXXX GB of space so it is not reading the drives, did we miss a step to bring the new drives online?

Does the raid array need time to build or should we have swapped them one at a time?

We have a copy of the data so that is not a major issue to restore it.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Pico
  • 1
  • `We (just noticed) we had three bad drives with blinking lights` - Which drive LED was blinking and what was the blink color and pattern? – joeqwerty Nov 30 '14 at 18:29

2 Answers2

17

Why would you ask the internet about this?

There's so much WTF here, that I don't understand where to start!!

This question shows a fundamental lack of understanding of hardware, RAID arrays, storage, monitoring, and general IT best-practices.

I read this question and can't help but think:

  • Who is actually responsible for this server hardware? Where is the sysadmin/consultant/IT professional?

  • Why would you turn off a server to replace hot-swappable disks in a hardware RAID array? It's not necessary to do so and it substantially increases your risk if you already suspect bad disks.

  • Did you understand what the "blinking lights" meant? What color were the lights? Perhaps they were indicating disk pre-failure instead of a complete failure.

  • You replaced the drives without knowing the impact of doing so. If anything, these actions made the situation worse and you may have destroyed your data.

  • Why would you expect the size the disk array to change following a drive replacement? What the hell does "XXXX GB" mean, and why is it pertinent to your question? How about relaying details like the capacity and type of disks, as well as the size of the array presented to the OS?

  • You just noticed a disk failure? You have spare disks available but no form of monitoring to actually identify failures? Your server monitoring should have TOLD you this. Even a basic visual check of the servers would help recognize problems. I doubt that the disks failed at the same time.

  • Did anyone check the system logs? What does the hardware RAID controller say when you boot the system? What do the Dell DRAC logs say? What does the operating system say?

  • Finally, if you have questions about the operations of your manufacturer-supported, brand-name hardware, and don't understand what's happening, wouldn't it have made more sense to assess your situation (check logs, data and backups) and contact Dell?

I understand the consumerization of technology means that people are often tasked with responsibilities and placed in situations that they're not qualified for, but the lack of basic troubleshooting skills exhibited here is appalling. It's unfortunate that people are paid to provide this level of service.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
10

With RAID 5 you can only lose 1 disk and have your data remain available. You have lost 3 so you will need to rebuild the RAID and then restore the data from backup.

We have a canonical Q&A about RAID levels that may help your understanding.

user9517
  • 114,104
  • 20
  • 206
  • 289
  • In that case, if we are going to rebuild the array, should we go with Raid 10 this time to support losing 2 at once or is there a better option? – Pico Nov 30 '14 at 18:01
  • if you can only lose 1 drive at a time, how is it we lost 3 and all kept working still? – Pico Nov 30 '14 at 18:16
  • 5
    @Pico the document I linked to provides the relevant information on the various RAID levels an their strengths/weaknesses. Which you choose to use is a business decision. I don't know what a blinking light on a Dell R510 (drive) means but I do know that now you have replaced 3 drives your RAID 5 is dead and will need to be recreated and the data recovered from a backup. – user9517 Nov 30 '14 at 18:36
  • 8
    @Pico What happened *before* no longer matters, what *you* have done has screwed your RAID array. – fukawi2 Nov 30 '14 at 22:42