20

I have a new server and am planning to upgrade the paltry 2 GB of memory to the maximum of 16 GB. (Theoretically 8 GB is the limit, but empirically 16 GB has been shown to work.) Some guides advise that ECC memory is not that important, but I'm not so sure I believe this.

I've installed FreeNAS and am planning to add ZFS volumes as soon as my new hard drives arrive. Would it be stupid to skimp and get non-ECC memory for a ZFS-based NAS? If it's necessary, then I'll bite the bullet, but if it's just paranoia, then I'll probably skip it.

Is there any reason ZFS or FeeeNAS specifically would require ECC memory, or suffer especially when running on a system using non-ECC memory?

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
iconoclast
  • 1,688
  • 2
  • 18
  • 30
  • 13
    Generally speaking for any kind of production/server application you want to pay for the ECC RAM. The guides that suggest ECC memory is "not that important" are suspect at best - I would venture to say that they're written by someone who has never had a single-bit error trash a production system. – voretaq7 Dec 03 '12 at 22:21
  • 1
    What would you be doing with a microserver that needs 16GB of RAM? – tombull89 Dec 04 '12 at 18:25
  • ZFS is a RAM-hungry to begin with, and I plan to install ESXi and run FreeNas on top of that. This way when I need some other server, I just create a new VM, avoiding a sprawl of boxes & cords. (If there's some home automation solution that doesn't suck like X-10, I've got a box for it. If I wanna use Git Lab for private repos, I've got a box for it. Etc.) – iconoclast Dec 04 '12 at 21:31
  • 2
    I think if he removed the context about his mini-tower rig which might be a bit of an insane build in production, the question as to wether or not to use ECC memory for a ZFS install is really the important part. – Kent Fredric Dec 09 '12 at 21:30
  • 2
    Matt Ahrens, who co-founded ZFS in 2001, [says](https://arstechnica.com/civis/viewtopic.php?f=2&t=1235679&p=26303271#p26303271): `There's nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem.` – Janus Troelsen Feb 19 '17 at 10:21

3 Answers3

13

ZFS only protects your investment in the data on the disk. If the server is to be in production then you want the highest possible uptime and ECC helps this by allowing the server to tolerate a ONE BIT error in failing memory. This can give you time to schedule and replace failing memory without a panic.

mdpc
  • 11,698
  • 28
  • 51
  • 65
  • @iconoclast Depends on which bit gets flipped. If it happens to be encrypted data, or the encryption key, then you just lost everything... – Michael Hampton Dec 04 '12 at 00:37
  • 1
    @MichaelHampton: so, in other words, encrypting the data on my server actually increases the chances of it being lost to a memory failure. – iconoclast Dec 04 '12 at 16:47
  • 2
    @iconoclast Encryption is no substitute for backups. Though if you encrypt your disks, you almost certainly need to encrypt your backups as well. – Michael Hampton Dec 04 '12 at 16:49
  • I don't think the uptime matters... especially when using such a low-end server. If uptime and resiliency was a concern, a different server type would be in use. Again, there will be many other areas of exposure or vulnerability that will come into play before the ECC RAM issue in this case. – ewwhite Dec 09 '12 at 23:01
  • 3
    @ewwhite Having a single power supply is an availability issue. Non-ECC RAM could affect both availability *and integrity*. It's not hard to imagine scenarios wherein integrity is more important than availability. – Skyhawk Dec 09 '12 at 23:12
  • 2
    As I noted earlier. This server *comes* with ECC RAM. This entire argument is silly because there's no reason to use something other than the [manufacturer-blessed RAM kits](http://h18004.www1.hp.com/products/quickspecs/13716_na/13716_na.HTML#Memory) with it. – ewwhite Dec 09 '12 at 23:25
12

ECC RAM is a good thing, but let's look at the context...

For your intended use, a ProLiant Microserver is a nice small form-factor low-impact server. It lacks some of the attributes commonly associated with production-quality systems (only four drive bays, single power supply, weaker CPU). So, I think you'll run into problems associated with those deficiencies far sooner than the effects of not having error-correcting RAM. The guides you've read are correct... ECC RAM is not going to be that important in that particular system...

This does not hold true for higher-end production-quality systems.

I'll add: The Microserver is spec'd with ECC RAM. Why wouldn't you use it?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 1
    I assume by "production quality" you mean *enterprise quality*? I'll have this *in production* (it's not for QA, UAT, or development), just on a very very small scale. But the data on it will be *real*, not garbage data generated for development or cloned from a production server. It will be *real production* data. (By the way, thanks for the very useful answer to help put things in context!) – iconoclast Dec 04 '12 at 16:42
  • 1
    @iconoclast no, production quality is still production quality. Single PSU is not suited for any kind of server that's important to keep up, unless you want to buy a spare PSU to keep around - which would be stupid since you could just plug that spare PSU in and have dual PSU's bla bla bla. Staying safe is not "enterprise" – pauska Dec 04 '12 at 16:54
  • @iconoclast It's all semantics at this point, but just because a server is used in production doesn't make it a production grade server. I could set up my company website on a Raspberry Pi, but it does't make the Pi a production grade server. – Dan Dec 04 '12 at 16:57
  • 3
    @iconoclast People *generally* think of a production server as being 24/7 and highly available. The latter is certainly a scale of cost / benefit ranging from simply having two PSU's right up to datacenter grade redundancy. Your setup, however, has none of these things – Dan Dec 04 '12 at 17:00
  • @Dan, yes I mostly/almost agree, *if* we qualify the statement: within the context of an enterprise, and for most purposes, a Pi is not production-quality. But we can't really decontextualize, can we? If you're building appliances that are perfectly suited for the Pi, then you might have a dev server, a QA server, and a production server. (I don't know, for controlling a security camera or something.) And so you might use a Pi in production. Not every server is an enterprise-level NAS or web server. – iconoclast Dec 04 '12 at 17:01
  • 1
    This PSU talk is garbage, with all respect. My network is anchored on two servers that are custom build. DNS, DHCP, Active Directory. Runnin a Micro-ATX board in a corresponding case, 8 SAS discs + 2 SSD, Raid controller, SINGLE PSU. YOu would call that non HA? Well, do it - I still have a HPC and Virtualization grid hanging of that as anchor points (i.e. one of them MUST be on). – TomTom Dec 09 '12 at 13:52
  • 2
    Some companies have servers which they turn off when they go home at the end of the day!. I wouldn't do that on my home network, but some companies don't really seem to care /that/ much about availability of in-house resources. – Kent Fredric Dec 09 '12 at 21:33
12

I would argue that running FreeNAS with non-ECC RAM is a stupid idea, as is running it as a virtualized guest, when the data stored on the ZFS volume is important.

Joshua Paetzel, one of the FreeNAS developers, has a good write-up on this topic: http://www.freenas.org/whats-new/2015/02/a-complete-guide-to-freenas-hardware-design-part-i-purpose-and-best-practices.html.

TL;DR

ZFS does something no other filesystem you’ll have available to you does: it checksums your data, and it checksums the metadata used by ZFS, and it checksums the checksums. If your data is corrupted in memory before it is written, ZFS will happily write (and checksum) the corrupted data. Additionally, ZFS has no pre-mount consistency checker or tool that can repair filesystem damage. [...] If a non-ECC memory module goes haywire, it can cause irreparable damage to your ZFS pool that can cause complete loss of the storage.

Ronald
  • 256
  • 2
  • 3