28

Can I run reliably with a single Fusion-io card installed in a server, or do I need to deploy two cards in a software RAID setup?

Fusion-io isn't very clear (almost misleading) on the topic when reviewing their marketing materials Given the cost of the cards, I'm curious how other engineers deploy them in real-world scenarios.

I plan to use the HP-branded Fusion-io ioDrive2 1.2TB card for a proprietary standalone database solution running on Linux. This is a single server setup with no real high-availability option. There is asynchronous replication with a 10-minute RPO that mirrors transaction logs to a second physical server.

Traditionally, I would specify a high-end HP ProLiant server with the top CPU stepping for this application. I need to go to SSD, and I'm able to acquire Fusion-io at a lower price than enterprise SAS SSD for the required capacity.

  • Do I need to run two ioDrive2 cards and join them with software RAID (md or ZFS), or is that unnecessary?
  • Should I be concerned about Fusion-io failure any more than I'd be concerned about a RAID controller failure or a motherboard failure?
  • System administrators like RAID. Does this require a different mindset, given the different interface and on-card wear-leveling/error-correction available in this form-factor?
  • What IS the failure rate of these devices?

Edit: I just read a Fusion-io reliability whitepaper from Dell, and the takeaway seems to be "Fusion-io cards have lots of internal redundancies... Don't worry about RAID!!".

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Will software raid and/or the PCIe bus even manage to keep up if you're pushing the IOPS limit? I honestly have no idea, but it's worth checking out. – pauska Jul 15 '13 at 02:01
  • 1
    The PCIe bus will keep up. The software RAID (if I *NEED* to) will be ZFS-based, so it's capable. I've ordered two cards for now, but Fusion-io literature seems to say, "one card is good enough". – ewwhite Jul 15 '13 at 02:03
  • Even ZFS needs to use CPU cycles to mirror data, so it does add complexity and latency - but how much it would affect your specific application is impossible to tell. Fusion seems to be very proud of their low CPU cost on the internal mirroring/safeguarding.. – pauska Jul 15 '13 at 02:11

5 Answers5

18

The on-device redundancy should do the job just fine for failures of the flash chips - analogous to RAID among all of the components doing actual data storage.

Should I be concerned about Fusion-io failure any more than I'd be concerned about a RAID controller failure or a motherboard failure?

A failure of the entire device would be pretty much analogous to the loss of a RAID controller or motherboard - I'd be approximately as worried about the Fusion-io card as these other single-point-of-failure components, though I don't have experience with the devices at large scale to be able to compare failure rates using hard data.

Do I need to run two ioDrive2 cards and join them with software RAID (md or ZFS), or is that unnecessary?

Adding redundancy in addition to what the device already has (say, software RAID among multiple Fusion-io cards) would be a lot like doing software RAID between two hardware RAID groups on two different RAID controllers; might be worthwhile for systems warranting extreme redundancy to remove an additional single point of failure, but not for common deployments (a 10 minute RPO on a mirror should be good enough for most applications?).

Sysadmins like RAID. Does this require a different mindset, given the different interface and on-card wear-leveling/error-correction available in this form-factor?

Yeah, I think so. You're essentially getting a device that's like a RAID controller and a bunch of storage devices behind it in one package. It's definitely tempting to be worried about putting your sensitive data on a single device, but one needs to have some level of trust in the device's internal redundancy... Which should be counter-balanced with a healthy understanding of the "RAID is not a backup" concept: always be prepared for the failure of a redundant component, or for a user to delete the data on it, with good backups.

Shane Madden
  • 112,982
  • 12
  • 174
  • 248
16

Ultimately, it comes down to your failure model. What is the impact of a failure?

Historically, we've always RAIDed everything since the cost of doing so has been negligible. Another $500 for a drive for mirroring? Totally worth the cost without even considering it.

When you're talking about another $10K+ to turn on mirroring, it needs a bit more consideration.


No, you do not need to mirror

The Fusion-io cards do have quite good internal redundancy. This isn't the kind of hardware where your disk is a single chip. In most of the situations where I've observed failure, it's been a firmware problem that has affected both members of a mirror so RAID would not have mattered.

Think of a Fusion-io card as a RAID controller with disks behind it. Are you fine with a single-controller setup? Probably. Treat it like that.

In many setups where you would deploy Fusion-io drives, you'll have other safeguards built in (redundancy at the node level) so it doesn't make as much sense.


Yes, you need to mirror

RAID increases your availability. Do you need absolute maximum availability despite the cost? Is the cost of a failure and possible downtime expensive? Go ahead and mirror the drives. In a statistically large setup, you will have failures of drives despite the internal safeguards.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
MikeyB
  • 38,725
  • 10
  • 102
  • 186
  • **Update:** I've mirrored the Fusion-io cards in the installations where the client was okay with the additional spend (and to ease myself into deploying the product). I've deployed a number of single card installations in other situations. Everything has been fine so far... – ewwhite Nov 14 '13 at 08:21
13

As you know we've used their kit for a while, in both RAID and non-RAID setups - I wish I had some failure experience to give you but I've not. We've had no failures that RAID would have helped with and their on-board resilience features are only getting better. Also the main function we use them for is now horizontally scaled/clustered now so we have even less reason to RAID them. Great cards though, highly recommend them.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • A good data point. However, I just can't tell if adding a RAID layer is overkill or not. – ewwhite Jul 15 '13 at 12:27
  • 1
    I see the technology as not being the defining point here - either your data needs the ability to support loss of a single FusionIO card or it doesn't - just think of them as fast, spendy, disks - that doesn't change whether you can live without RAID or not right? – Chopper3 Jul 15 '13 at 15:45
  • Slightly... A traditional approach would be to use enterprise SAS SSDs in a RAID 1+0. That's just applying the same standard used for spinning disks to SSDs. But that also assumes hot-swappability. That doesn't apply to a PCIe-based card, especially when I'd be forced to use software RAID to accomplish this. Since Fusion-io also has the benefit of better wear-leveling and monitoring, I'm trying to understand the realistic failure modes involved here. Do I treat the Fusion-io like a disks or a controller? You wouldn't put two Smart Array card in a ProLiant to serve internal disks, right? – ewwhite Jul 15 '13 at 15:55
  • 2
    @ewwhite You might have two RAID controllers with different disks, and RAID1 between the controllers' disks, if you need to be able to handle the failure of a RAID controller. I'd say treat the Fusion-io card like a RAID controller in terms of your redundancy planning. – Shane Madden Jul 15 '13 at 16:19
  • @ewwhite If you look at the wording I used you may read between the lines that we did have one outage on a single FusionIO-equipped server - we had a mobo go pop - a DL580 G6 (we have very few of them) and something happened to one whole bank of memory and it took out the who board. In this scenario it was RAID 1'ed but obviously that didn't matter. Of course do bear in mind that PCIe *can* be hot-swappable, it can be a massive faff but it can work fine. – Chopper3 Jul 15 '13 at 17:58
9

I'm not familiar with Fusion directly, but I do have some PCIe SSD experience to work from.

The ones I work with present four different LUNs to the OS, and treat the PCIe card like an HBA. If I want RAID, I'd mirror two LUNs together using the OS. This allows me a one-card solution for redundancy. Though, if the card outright fails I'm still up a creek. I don't know if ioDrive does the same thing.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • This particular unit will present one block device. – ewwhite Jul 15 '13 at 02:19
  • 1
    ioDrives present one or two independent devices. If it presents two, they are physically two separate devices on one card each with its own internal protection. Compare to, say, the Intel 910 which presents four devices each which should be treated as an SSD. – MikeyB Jul 16 '13 at 18:47
5

I bought 6 of the 1.2tb cards in the last couple months. One of them has already failed. So I would absolutely raid them. I used windows active disk mirror. The drive failed with the message "missing LEB map". I was told it would need to be swapped out. But to get the RMA approved I would need to take pictures on both sides of the failed card (requiring a production outage in order to take the card out). And then they told me the replacement card was out of stock with no eta. So you might want to think pretty hard before you buy them.

user229000
  • 51
  • 1
  • 1
  • Thanks for sharing your experience. I went ahead and deployed these 1.2TB cards in mirrored pairs. I'm using HP SKU's, so I have my HP support contract to handle RMA/replacement. – ewwhite Jul 02 '14 at 17:48