6

We've been running ZFS on top of a single hardware RAID on dell poweredge for years. I know most people are against this, but the snapshot/clone, compression and flexible partition of ZFS served us very well. Whenever a drive dies, a dell technician is dispatched (the server is in another State), he will confirm that drive has an amber light, and replace it.

Now we want to take advantage of L2ARC/ZIL caching of ZFS, we are seriously thinking about running ZFS on bare disks. Current dell RAID controllers (PERC H730/H330) do support pass-through. My remaining questions are:

  1. if a drive fails from ZFS, does it display amber light on the front panel? This is important because, the dispatched dell technician may need to confirm the drive is indeed faulty. Otherwise we may have problem with dell.

  2. do any people run ZFS on FreeBSD root? It is in production quality? Any known issues?

John
  • 369
  • 1
  • 4
  • 13
  • 2
    If you're running ZFS atop hardware RAID, it's okay to use the RAID controller's cache for pool ZIL versus a dedicated ZIL device. It's a better solution than reworking the entire setup to accommodate RAW disks. For L2ARC, that's going to depend highly on your workload, but have you maximized RAM yet? Have you looked at `arcstat` to see what your hit rates are and whether you'll _need_ an L2ARC? – ewwhite Sep 21 '15 at 14:38
  • Interesting, how can I use the RAID controller's cache for ZFS pool ZIL? That cache has to appear as a device accessible to the operation system in order to configure that, right? What is its device name? Is it a pointer for that? We are purchasing new servers so indeed more memory are planned. – John Sep 21 '15 at 20:24
  • 2
    ZIL is in memory. In the absence of a dedicated ZIL device, ZIL in-flight transactions are flushed to the pool's disks. If you're behind a RAID controller, that's going to go to that controller's NVRAM. So you don't have to do anything at this point. – ewwhite Sep 21 '15 at 20:56
  • 1
    So, do you mean, I now need to enable cache in that RAID? And that has no adverse effect elsewhere? I used to disable that cache in the RAID when running ZFS on top of the RAID. – John Sep 22 '15 at 01:52
  • 1
    Yes, please see: http://serverfault.com/a/545261/13325 – ewwhite Sep 22 '15 at 03:55

3 Answers3

6

You can control the PERC H730 and H330 using the LSI MegaCLI utility as both of these cards are Dell PERC badged LSI cards.

There is an excellent article and tutorial on how to do this at https://calomel.org/megacli_lsi_commands.html

I know that zfsonlinux has a ZFS Event Daemon (ZED) which you can use to cause particular things to happen on certain events (e.g. use MegaCLI to turn on the amber light for a particular slot when a drive dies).

IIRC, FreeBSD has a ZFSd which can do similar things but I am not an expert on FreeBSD so can not point you to more information other than to say that the FreeBSD forums are full of useful advice and helpful people.

I suspect that the hardest part of doing this will be figuring out what the MegaCLI "slot" number is for a given drive, because ZFS only knows about the device node / name, and doesn't have specific LSI or PERC information. if the device node name is directly related to the card and slot number, it may be a trivial transformation....otherwise, it may be quite difficult.

Even if you have to manually use MegaCLI to turn on the amber light from the shell when a drive dies to satisfy the Dell tech's procedural expectations, you're still better off giving ZFS raw drives rather than overlaying ZFS on top of hardware raid - you're losing most of the important features of ZFS by doing that, and they're the most important features (e.g. error detection and correction for your data).

cas
  • 6,653
  • 31
  • 34
  • "you're still better off giving ZFS raw drives rather than overlaying ZFS on top of hardware raid" - Except replacing a failed drive becomes much, much harder than just popping a new drive in. How often does ZFS find a data error compared to how often an entire drive fails? – Andrew Henle Sep 21 '15 at 11:09
  • Even with HW raid, it wont turn green as soon as the drive is swapped - it'll only turn green when the raid array has finished syncing the new drive. That, btw, is also an event in ZFS that can be detected to cause a script to run to turn the light back to green - the implementation details may differ but it's exactly what the HW raid does. – cas Sep 21 '15 at 12:05
  • Ok, then to convince the dell technican I would have to program to emulate the lights to behave exactly like in RAID: when drive dies (amber), when new drive is inserted (blinking?), when rebuilding (?) etc. FreeBSD does have MegaCli utility, possible command arguments are at http://things.maths.cam.ac.uk/computing/docs/public/megacli_raid_lsi.html . It's not clear what sets amber. Looks like this alone will be an involved project. What do other people do in such situation? We have to rely on dell technician to do remote repair. – John Sep 21 '15 at 12:51
  • found ZFSd, at http://svnweb.freebsd.org/base?view=revision&revision=222836 . But looks like it did not eventually make into FreeBSD 10, the latest distribution. – John Sep 21 '15 at 13:05
  • @John What did we do on about 50+ Solaris with ZFS boot drives? Run hardware raid because you can't be sure of how good the tech (HP in our case) sent out to replace the failed drive is at doing anything other than replacing the drive with the failure light lit. How many hard drives have you seen fail? How many times have you seen ZFS find data corruption? – Andrew Henle Sep 21 '15 at 16:55
  • our drives are almost all replaced once within the five year support period. So we worry about this. Obviously, we have no idea whether ZFS can find corruption since it only sees a single RAID. – John Sep 21 '15 at 20:28
  • Unless you disabled it, ZFS still does a data checksum is quite likely to detect data corruption even if there isn't a correct ZFS-controlled copy of the corrupted data available. Ever have ZFS report such corruption? – Andrew Henle Sep 21 '15 at 20:36
  • I don't remember we ever have had a corrupted ZFS. – John Sep 22 '15 at 05:47
4

The DELL PERC H330 or H730 aren't suitable cards for ZFS in FreeBSD. There's a lot of misunderstandings about the "passthrough" mode of those cards and they simply don't implement this the way it should be for FreeBSD. It sure works on Windows but it's not the case with FreeBSD.

If you try to use those controllers on FreeBSD 10.2 for example, it will load with the mfi(4) driver which isn't actually the supported way to give raw disks for ZFS. For example: using this driver the SMART info will be unavailable to the operating system and this will compromise the reliability of your array.

There's workings on the mrsas(4) driver, but this driver appears to be unreliable at this moment and unusable for anything. Disk drops isn't viable on a storage system. There are some reports of this behavior overhere: https://bugs.freenas.org/issues/11764

So my recommendation at this moment is to stick with your setup. I know, this sucks. But this is the best option with ZFS at this time. Keep in mind that ZFS is really temperamental with the hardware you give to him. So it will expect a proper hardware to work as it should.

If you really want to use ZFS with it's power, get a proper HBA card (or a RAID controller that can be flashed with IT - Initiator-target - firmware) and you're good to go.

Although the PERC H330 is based on the LSI/Avago SAS3008 chipset and it's basically a clone of the LSI/Avaga 9300-8i HBA it's not possible to crossflash the H330 to IT mode, and the stock DELL firmware does not implement the proper (needed) IT mode for FreeBSD.

At this moment I'm with a similar issue. I don't care about the leds on the system or whatever, but I'm stuck with a PowerEdge R730 with one H330 and unable to proper run a ZFS storage on the machine.

Vinícius Ferrão
  • 5,400
  • 10
  • 52
  • 91
  • 1
    Thanks for your explanations, I just saw it. For other's records, we ended up keeping ZFS on top of Dell's PERC, with great satisfaction. We learned to choose the PERC option with most cache, which significantly boosts IO performance. Our benchmark with such setup is no inferior than other commercial system built with native ZFS with cache. – John May 03 '17 at 21:48
1

I was able to figure out how to do it. Wish I had thought to do it sooner.

I flashed a DELL H330 RAID card to HBA IT Firmware.

See Here: https://forums.servethehome.com/index.php?threads/crossflash-dell-h330-raid-card-to-12gbps-hba-it-firmware.25498/

Sleyk
  • 11
  • 1