Supposedly (see, e.g., a question about it here), with NCQ enabled drives, the drive write cache is supposed to be safe, as in it doesn't lie to the OS about data being committed to the platters when it isn't. I'm trying to figure out what settings are required to make this a reality.
I'm using diskchecker.pl to confirm if all blocks surviving a pull of the power plug. The server is configured like this:
- 4x ST3500514NS running in Linux MD RAID10. Intel 3420 chipset. In AHCI mode.
- LVM running on RAID10.
- Tested filesystem is ext4 (with barrier=1,data=ordered) on a logical volume. I also tried testing directly on a logical volume (block device); that didn't help.
- Debian 6.0 (squeeze); kernel 2.6.32-5-amd64
If I turn off write-cache (hdparm -W0
), then it works (at a huge performance penalty). So it seems like the upper layers are capable.
I've tried enabling FUA in libata (by passing fua=1
to the module loading, and confirming via dmesg
), that did not help.
Any suggestions on how to make this work?
edit: found the reason (see my answer); any suggestions on how to get at least some of the performance back?