6

Supposedly (see, e.g., a question about it here), with NCQ enabled drives, the drive write cache is supposed to be safe, as in it doesn't lie to the OS about data being committed to the platters when it isn't. I'm trying to figure out what settings are required to make this a reality.

I'm using diskchecker.pl to confirm if all blocks surviving a pull of the power plug. The server is configured like this:

  • 4x ST3500514NS running in Linux MD RAID10. Intel 3420 chipset. In AHCI mode.
  • LVM running on RAID10.
  • Tested filesystem is ext4 (with barrier=1,data=ordered) on a logical volume. I also tried testing directly on a logical volume (block device); that didn't help.
  • Debian 6.0 (squeeze); kernel 2.6.32-5-amd64

If I turn off write-cache (hdparm -W0), then it works (at a huge performance penalty). So it seems like the upper layers are capable.

I've tried enabling FUA in libata (by passing fua=1 to the module loading, and confirming via dmesg), that did not help.

Any suggestions on how to make this work?

edit: found the reason (see my answer); any suggestions on how to get at least some of the performance back?

derobert
  • 1,288
  • 12
  • 22

3 Answers3

3

Upgrading to kernel 2.6.38-2-amd64 (from sid) fixes the problem, at the cost of a huge performance penalty (very similar to just turning off the write caches).

Doing some research into this, it seems that MD didn't support I/O barriers (except on RAID1) until 2.6.33-rc1 (commit a2826aa92e2e14db372eda01d333267258944033).

derobert
  • 1,288
  • 12
  • 22
  • 1
    There is more about write barriers, write caching, etc in this answer about LVM risks: http://serverfault.com/questions/279571/lvm-dangers-and-caveats/279577 – RichVel Feb 29 '12 at 12:13
3

Yeah for what i know this is the cost to be safe, you can see many threads about data safety and the speed cost in every one filesystem and storage layer in the Postgresql mailing list, they have been speaking lately of SSD safety for example, only the Vertex 2 Pro or the last SSD intel series that have a small memory attached (like a battery cache in a raid controller) are safe to database use and the problem with SSD can't be fixed disabling write cache.

I paste here two links but you have multiple examples in the mailing list, do a search.

http://archives.postgresql.org/pgsql-performance/2010-06/msg00076.php

http://archives.postgresql.org/pgsql-general/2011-04/msg00709.php

skuda21
  • 173
  • 1
  • 6
  • ST3500514NS are most definitely not SSDs. But that is interesting. – derobert May 05 '11 at 16:11
  • I know the disks you are speaking about are not SSDs, but i wanted to point you one example of the interesting threads about data safety in the Postresql mailing list, they have been discussing many times about mechanical disks, storage layers safety, barriers, write cache, battery backed unit in hardware raid and the speed cost to be safe. – skuda21 May 06 '11 at 07:08
1

That's why you really should be using an hardware RAID controller with a BBU (battery backup unit). Then you can both have your write cache on and be safe.

wazoox
  • 6,782
  • 4
  • 30
  • 62
  • Yeah. I have a 3ware one on a different box. The data is indeed safe, in the sense that had Sony bought some of those, PSN data would be safe from crackers. Every time a single drive timed out, it'd discard cache on *all* drives, leading to massive corruption. I turned off the cache. – derobert May 05 '11 at 16:37
  • you should use professional grade drives, too. Particularly NOT WD desktop drives. Life is hard. – wazoox May 05 '11 at 20:58
  • Ummm, these are Seagate Constellation ES drives. The other server (the one with the 3ware card) has VeliciRaptor drives. No cheap desktop drives in sight. Though, honestly, I've got other machines with cheap desktop drives, they've proved only a little less reliable. – derobert May 06 '11 at 21:02
  • Then the problem must lie elsewhere. I currently have set up 220 servers with 3Ware RAID adapters and up to 48 drives each, and I never had any corruption problem. In fact I had, once, it was a bad RAM on a 9550 controller. Maybe your controller is bad. – wazoox May 07 '11 at 21:08
  • 1
    Quite possible. Soon I hope to pull that machine, and do extensive testing on it. – derobert May 09 '11 at 17:18