4

Is a 3-way RAID1 using mdadm a good solution to be able to sustain any two drives failing without the RAID failing? I know this costs extra in the sense of only being able to use 1/3 of the disk space (1 out of the 3 drives), but what about aside from that?

sa289
  • 1,308
  • 2
  • 17
  • 42
  • 2
    If you want to tolerate two drive failures, then any write to the logical media has to write to three physical drives. There is no way to avoid that performance penalty. But with proper choice of hardware writing three replicas in parallel isn't going to slow down operation a lot. There is an important scenario, which you did not explicitly include in your question. When one drive dies completely, then during recovery you may find bad sectors on the other disks. As long as those bad sectors are in different locations it is recoverable in principle, but I don't know how mdadm deals with that. – kasperd Apr 18 '15 at 21:34

5 Answers5

8

To have a single array capable of 2-disk failure, you have two choices:

  • three-way RAID1, as you suggested
  • RAID6, as another possibility.

What is the best choice? It depends of what you are trying to achieve.

  • if you want a setup that give you the possibility of take out a disk, install it on another computer and still be capable of reading your data, use RAID1.
  • if you want to be able to expand your array and gain additional space each time, use RAID6

A note about RAID1 performance degradation: it does not depend on bus congestion, rather on how mean disk seek time is influenced by multiple writes. Disk seek time is composed of two different parts: seek latency (the time the head need to reach the correct angle) and rotational delay (the time the disk platter need to rotate to the correct position).

When multiple disks are involved it multiple, identical writes, the rotational delay as measured by the host will be the worst of all the involved disks. Seek time, on the other hand, should be relatively similar between RAID1-ed disks. In the end, this means that RAID1 arrays will have slightly lower write IOPS values vs a single identical disk.

Linux's mdadm has an interesting provision to minimize the impact of different disk's latency. For example, read the man page about "write-behind" and "write-mostly":

-W, --write-mostly subsequent devices listed in a --build, --create, or --add command will be flagged as 'write-mostly'. This is valid for RAID1 only and means that the 'md' driver will avoid reading from these devices if at all possible. This can be useful if mirroring over a slow link

--write-behind= Specify that write-behind mode should be enabled (valid for RAID1 only). If an argument is specified, it will set the maximum number of outstanding writes allowed. The default value is 256. A write-intent bitmap is required in order to use write-behind mode, and write-behind is only attempted on drives marked as write-mostly.

Note that this will lower your random read IOPS performance (as some disk will be effectively used for write only), so be careful choosing your poison.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • If we're planning to use SSDs, does that affect your answer about the bus congestion? – sa289 Apr 20 '15 at 19:10
  • 1
    SSDs are different beasts. Even a single, modern SSDs can sature a SATA 3.0 link, so _many_ SSDs writing in parallel surely are a bandwidth hog. By using an hardware RAID card you can partially work-around the issue of CPU-controller transfers (OS sees a single disk and send a single copy of data, so the PCI-E bus in engaged for a single data movement; it's the controller's work to duplicate these data and send them to the disk). On the other hand, with SSDs the limit can very well be the connection between the controller and the backplane, so there is not a silver bullet... – shodanshok Apr 20 '15 at 20:39
6

Yes, you can add as many mirrors to a RAID1 as you like, and you can tolerate failures of all but 1 device. If you add 10 devices, you can tolerate a failure of 9 devices.

Don't forget there will be a write penalty for this setup though. All data has to be written to every device. Generally it should be fairly insignificant but if all devices are on the same controller/bus then you may start to notice the delays as your data is written to every device. For example, with 3 devices, writing 1mb of data to the array requires the controller/bus to actually write 3mb to disk.

fukawi2
  • 5,327
  • 3
  • 30
  • 51
  • What sort of case do you think we might notice delays? Is there a certain throughput we'd have to hit on most modern controllers/buses before we'd notice it? – sa289 Apr 16 '15 at 00:14
  • That's too broad to be able to answer. There's no generic answer. It depends entirely on your hardware and your workload. – fukawi2 Apr 16 '15 at 02:18
  • 1
    If you want to reduce this write penalty from 3x to 2x and still be able to lose up to two drives, you might consider using a standard 2-drive RAID1 and use the third drive as a hot spare. The downfall here is if you lose the two main drives at the same time (like with a power surge), your third drive is not going to help. – GuitarPicker Apr 16 '15 at 13:16
  • @fukawi2 - if it's too broad to give a specific answer, is there some lower bound that it's going to be at least better than? For example, if it's SATA III, could we assume it can handle at least a single SATA channel's worth of throughput (i.e. 6Gbps) , but quite possibly higher? – sa289 Apr 16 '15 at 19:39
  • That's possibly a safe assumption, unless all drives are connected to the same controller (*which has to co-ordinate all those writes*). If that controller isn't very good and can't keep up then you could see performance lower than a single SATA III drive. – fukawi2 Apr 16 '15 at 21:19
3

Another solution is raid 6 with 3 disks. See this post:

Minimum number of disks to implement RAID6

Raid 6 will also allow for doubling capacity by adding a fourth drive. I have had 2 drives fail on an array and not lost data.

Dan Sawyer
  • 31
  • 2
  • 1
    That's a good point about future expansion. I think the write performance penalty due to RAID 6 is a drawback to that setup. I suppose the future plans could help decide which setup is more desirable. – sa289 Apr 07 '15 at 15:36
1

First, I think it's of importance to note the usage scenario and the quality of components used. It's not the same if you're using desktop HDDs and cheap raid controllers or going full enterprise hardware.

If the only thing you're doing is replication across HDDs (RAID1) then you can afford to lose n-1 hard drives and still have all the data intact.

But I'd really like to know what is your usage scenario and hardware selection that you're so concerned losing 2 drives simultaneously?

Recently, I've setup a webserver for a ISP. Server had a 6 port RAID controller. So I've set up RAID 60 as a good tradeoff between speed / security.

I advise you to read through this link

In regard to your clarification, I strongly suggest going for either RAID 5 or RAID 60... Alternatively, if cost is the issue, Simple RAID0 with two-tier offsite backup would be enough.

My references are my own experiences setting up numerous servers in vastly different usage scenarios.

Sibin Grasic
  • 476
  • 1
  • 5
  • 19
  • The drives will house important data for some websites and I've seen SSDs brick before well past infant mortality (I got hit with the Intel 8MB bug, though that was on a consumer-grade drive), so I want the extra fault tolerance just as a precaution since the data is critical. – sa289 Apr 20 '15 at 19:08
  • I have revised my answer. – Sibin Grasic Apr 20 '15 at 19:25
  • Keep in mind the bad block issues if you use HDDs, you can easy get hit by it on a 1 or 2 TB drive size if you need to resync the raid because of an hardware fail. – Dennis Nolte Apr 22 '15 at 12:13
1

I have always been a big fan of hardware-based RAID 5. I typically use Ubuntu Linux for the server if the planned use allows. With the hardware based RAID, Ubuntu (as well as any other operating system) has no trouble booting from a RAID-5 array in most modern servers. I also use multiple backups. The first backup is an hourly backup at the server on an external drive using Back-In-time to provide an on-site backup every hour during business hours. The second level backup is a nightly backup of the network share drives using another computer running Ubuntu and Back-In-Time. The nightly backups are also made to portable USB drives and at least one is kept off-site. Drives are rotated daily during the business week. The third level backup is to a retired Windows Vista computer running Ubuntu Linux, configured similarly to the server configuration where each night the server files are synchronized to the backup system using the Linux utility rsync. RAID-5 (with a hot spare) was good the last few years when there were drive failures. The failed drive (hot-swappable) was replaced in each instance without interrupting network activities. RAID-5 didn't help when the server experienced a hard crash, probably from motherboard or memory failure. What did help was the spare backup server that had the synchronized files from after close of business the previous night. I have a small script that I ran to migrate the server configuration to the backup server, which migrates all the user and machine accounts, making the spare computer a temporary PDC. It took a couple hours to put together another retired Windows computer to make a new backup computer system and put it online. I opted to replace the more expensive Proliante ML350 server with a more modest Proliant ML10 server. I will be configuring the new server with RAID-1 as a 3-drive mirror with a hot spare. The ML10 server I have ordered uses a software RAID controller which has to be configured as AHCI instead of RAID for Ubuntu to boot. The total cost for the server and four 1TB drives is about the cost for one 300GB drive on the ML350. This is the second time in 25 years of managing servers RAID-5 didn't help (the first time was probably a failure of the RAID controller). Both instances are not a problem of RAID, just a problem resulting from using technology.

The main point I want to get across is be prepared for when your server failure occurs and have a good backup plan. To be a good backup plan, you have to actually test the backup and recovery procedures. In the case of the most recent failure, to total time from I got the call getting me out of bed, getting dressed, grabbing a quick bite to eat while going out the door, driving to the site (10 minute drive), diagnosing the problem (including an attempted restart of the server), and getting the backup server online was 52 minutes.

You can have discussions on which is better of the different possibilities of RAID. Just keep in mind that more things can fail other than hard drives. Use the type of RAID that you think best for your use, but plan for recovery because of hardware failure or a malware/virus attack.

Jake Page
  • 11
  • 1