Making SD card corruption-proof

Question

My embedded linux device uses an SD card for saving certain diagnostics data, far too copious for internal flash.

The problem is if the device is switched off unexpectedly, the filesystem (FAT32) on the card is corrupted.

There is no way to prevent unexpected power outages or user switching it off like that, and the device should be relatively maintenance-free. Worse, the data is written continuously, so the corruptions are very frequent, and Linux upon detecting faulty FS remounts it read-only silently.

What methods would you suggest to mitigate this? Will running fsck.vfat automatically on startup suffice?

Some more info:

The card is not to be considered removable by user. It's to be thought of as internal disk. Any data stored on it will be accessible for download over the network or to a usb drive, and the system automatically purges oldest entries. That means it does not need to be readable in your average PC.
The system currently supports FAT, yaffs and jffs2. Adding other filesystems to the kernel is possible but if other avenues exist, we'd prefer them first.
Writing can be suspended on demand even for several minutes without data loss.
partial data loss or minor corruption is acceptable. Complete stopping of logging is not.
the poweroff events are completely unpredictable most of the time.
the system is running on ARM9, 200MHZ, 64MB RAM, 32MB internal flash and uses up most of CPU power for its primary role. Take this into consideration while thinking of fancy resource-heavy solutions.

You're probably already considered it, but it's worth mentioned for others wandering across this question: *Most* flash cards (SD, CF, etc) only have a write tolerance of a few thousand cycles (at best). Using normal cards for data logging or similar tasks will kill them eventually (and commonly in less time than people think). — Chris S, Jan 11 '13 at 13:44
@ChrisS: This being mostly append-only, and replacing oldest with newest entries, has an inherent nature of very good load-balancing of writes, especially that it's months to fill the card. The problem may be with the FAT entry itself but I trust the controller makes something sensible about it. — SF., Jan 11 '13 at 13:50
What is the cost if your device is powered off and does not write this data to the card? Like if the diagnostic data isn't written will you lose a lot of time or money or just not have some log files? — Freiheit, Jan 11 '13 at 14:17
@Freiheit: A rather obscure though not entirely unimportant feature marketed for the customers is missing, and additionally in case someone else screws up really bad and seeks scapegoat, we lose one of avenues of defense in court. Thing is data prior to likely failure is most valuable - a proof that the device worked correctly until last moment, and not that its own fault made the events escalate into the disaster. — SF., Jan 11 '13 at 14:39
Noted. You're clearly capturing data for something important! — Freiheit, Jan 11 '13 at 17:56
@SF. At every write (or nearly as often), the FAT almost certainly gets written to as well. That the data is written to the file in an append-only mode doesn't help a whole lot with that. The FAT is concentrated and not possible to relocate. So unless the card itself has wear-levelling technology (I don't know how common that is), you are hitting some parts of it very heavily. That, not appending log data across the card, is going to cause you to soon hit the write cycle limit Chris mentions. — user, Jan 11 '13 at 18:50
You probably already consider it, but can't you use the network connection you use to download the data, to write the data on a network drive instead of locally? — 20c, Jan 17 '13 at 20:26
@20c: Yes, wherever the customer decides to handle the cost of continuous GPRS data transfer. One small packet per second takes much more than a gzipped file grabbed all at once; also downloaded only when/if needed and only required scope vs continuously. — SF., Jan 18 '13 at 02:07
See http://unix.stackexchange.com/questions/26516/safely-use-sd-cards-when-power-can-go-out-at-any-time — derobert, Jan 18 '13 at 15:55

score 8 · Accepted Answer · answered Jan 11 '13 at 13:41

You could use the block2mtd driver to use the transactional jffs2 or yaffs(2) filesystems you seem to be employing elsewhere for your SD card, which would solve your problem of data loss or filesystem corruption at poweroff.

Doing so might incur other problems, though. As the SD card is likely to have own mechanisms for wear levelling and sector remapping in place, these could interfere with jffs2's and yaffs' implementatons to do the very same, decreasing the lifespan or the performance of your SD card. If this is not an issue, it should be worth trying.

With month or two to fill a 2GB SD card, reaching wear limit even with entirely randomized load balancing this should be not a problem. — SF., Jan 11 '13 at 13:55

score 5 · Answer 2 · answered Jan 11 '13 at 13:41

Check whether the kernel you use supports flush and/or sync flag for vfat (it seems some versions ignore it, be careful!).

Or just do away with the filesystem altogether if everything can go into one file (as would be the case with a raw log stream!) or into a few fixed size files (use partitions ;)

Making SD card corruption-proof

2 Answers2