1

I've been thinking about ways of speeding up disk I/O, and one of the bottlenecks I keep coming back to is the journal. There's an obvious benefit to using an SSD for the journal - over and above just write caching unless of course I just disable the journal with the write cache (after all devicemapper doesn't seem to support barriers). In order to get the benefits from using a BB write cache on the controller, then I'd need to disable journalling - but then the OS should try to fsck the system after an outage. Of course if the OS knows what's in the batter-backed memory then it could use it as the journal - but that means it must be exposed as a block device and only be under the control of the operating system.

However I've not been able to find a suitable low-cost device (no, write-levelling for Flash is not adequate for a journal, at least one which uses Smartmedia).

While there's no end of flash devices, disk/array controllers with BB write caches, so far I've not found anything which just gives me non-volatile memory addressable as a block storage device.

symcbean
  • 19,931
  • 1
  • 29
  • 49
  • This is exactly what the [FusionIO](http://www.fusionio.com/) devices do, correct? – EEAA Feb 06 '12 at 22:47
  • @ErikA: Still uses flash - *might* be using a more effective wear-levelling algorithm however cost is still exorbitant - was thinking more of something like DDRdrive - but even this is 5 x price of a laptop with the same DRAM capacity. – symcbean Feb 07 '12 at 09:12
  • (or 10X the price of BBRAM write cache) – symcbean Feb 07 '12 at 09:14
  • It seems Gigbyte used to make exactly what I've described here - marketed as an iRAM drive - http://www.anandtech.com/show/1742 – symcbean Feb 08 '12 at 13:31
  • And there's something called a hyperdrive 5 - http://www.hyperossystems.co.uk/07042003/hardware.htm – symcbean Feb 08 '12 at 13:40
  • Symcbean, from what I heard the hyperdrive uses the "acard" hardware. Might be worth to check out. – 3molo Jun 22 '12 at 11:00

3 Answers3

2

Could you explain why there's an obvious advantage using SSDs for journals? All FS implement journals as some kind of ring buffer where the access is sequential anyway. Just disable barriers on BBWCs and it's as good as it gets.

EXT3 and EXT4 filesystem used to mount with a journal in 'ordered' mode. In newer Kernels the default is 'write-back' mode. In the'ordered' mode journal updates were committed to disk before any data was written. In 'write-back' mode journal updates are written to the disk according to the normal IO scheduler policies and data writes to disk are not blocked on journal updates. You basically don't see that there's a journal involved at all from the performance POV.

What you want is to disable barriers (and probably mask the FUA and SCSI_CACHE_SYNCHRONIZE SCSI CDBs) on anything that can protect your writes by battery otherwise your performance will suffer. You get the semantics of barriers (or anything that flushes to disks and waits for acknowledgment) on your journal and data by using a BBWC.

A NVRAM without the proper support from the OS (VFS) isn't going to help you with anything that BBWC (which is some kind of NVRAM anyway) cant solve. In Linux 2.4 there was support for certain NVRAM devices for accelerating NFSv2/3 that is running synchronously but BBWCs have made them obsolete.

The proper thing to do with flash devices is to not think of them as a block device and use them as what they are - NAND flash. But this would need a redesign of how we think a file system should talk to the underlying storage in a more generic way. There's also no proper API in place in the Kernel to make use of NAND flash at the moment.

pfo
  • 5,630
  • 23
  • 36
  • Not really. The barriers are there for a **reason**. Certainly the checksums on ext4 make the late barrier pretty much unecessary - but this is not true for the early barrier, nor for anything other than ext4. Also, most SATA disks have a habit of being less than honest about when blocks are committed from their cache (reordering still occurs). Using write-back on jbd2 seriously compromises the integrity of the journal. And, yes there are device drivers to make proper use of flash memory - that's kind of the point of JFFS2 – symcbean Jun 25 '12 at 22:38
  • Yes, JFFS2 is one example. I wasn't considering that. But again I understand what you want barriers for or you don't trust your BBWC or have drives that lie to you about the status of their caches which the HBA with BBWCs should disable. – pfo Jun 25 '12 at 22:51
1

Only answer so far is that from Nils - but as per my comments, there is the iRAM and hyperdrive devices. So I'll close this off

symcbean
  • 19,931
  • 1
  • 29
  • 49
-1

SSD drives should not be used for frequent write operations. Why do you not simply put the journal onto a separate fault-proof RAID system (5 or 10) consisting of ordinary disks (at least 4 "small" ones)?

Nils
  • 7,657
  • 3
  • 31
  • 71
  • No - flash drives are write limited, so not suitable for frequent writes - they are not the only type of non-volatile solid state storage. – symcbean Feb 29 '12 at 09:35
  • Right. This is what I wrote. Use a separate RAID of classical harddisks. – Nils Feb 29 '12 at 13:31