8

I would like to use linux SSD caching (dm-cache or bcache) with Debian Jessie production servers. (kernel 3.16)

My question: Are dm-cache and bcache modules reliables in linux 3.16 ? Do I need to upgrade my kernel to a more recent version ?

I also found this worrying message about bcache: https://lkml.org/lkml/2015/12/22/154

Notice that I totally understand what implies caching mode choices (write-back/write-through) in term of reliability and data-loss, my question is more about software bug in these modules


February 2018 follow up after more than 1 year of bcache on a continuous integration server (jenkins instance running lot of intensive jobs !)

Configuration of the server (storage stack essentially)

Hardware:

  • 2 x 480GB SSD (Samsung SM863 enterprise grade MLC)
  • 2 x 4TB HDD (Seagate Constellation ES.3 SATA)
  • Dell R730 - Dual Xeon E52670 - 128GB RAM
  • NO hardware RAID, no battery/flash backed hardware write bache, that's where the bcache's writeback feature becomes interesting.

Software:

  • configured in September 2016, never rebooted
  • Debian Jessie with 4.6 kernel (from official jessie-backport at the time of last update)
  • software MD raid 10
    • 1 raid10 device for the 2 SSD
    • 1 raid10 device for the 2 HDD
  • 2 LVM VG on top the 2 raid devices
  • a bcache "caching" device created on a logical volume on the SSD_RAID10 VG
  • a bcache "backing" device created on a logical volume on the HDD_RAID10 VG
  • the bcache cache configured as writeback

Workload

  • many jenkins jobs (continuous integration)
  • cpu intensive jobs mixed with periods of I/O intensivity
    • before using bcache such periods where rising regularly the I/O average latency above 5 seconds (!!!)
  • real workload on this server started only 1 year ago (~Feb 2017)

I/O amount issued on the bcache device according to /proc/diskstats)

  • 350TB written
  • 6TB read (I double checked that, I think that the large amount of RAM helps a lot to cache the reads in the VFS layer)

Result

  • rock stable ! the machine never had to be rebooted (uptime 525 days), no corruption detected.
  • hit rate is high ! 78% in all time average, and rising: above 80% in the last months
  • writeback helps a lot: the disk latency is now order of magnitude lower, sadly I have not accurate measures for that, but the computations are not stalled anymore by write bursts. The dirty data amount rises above 5GB, where an hardware RAID writecache has usually a size between 512MB and 1GB )

Conclusion

  • bcache is rock stable on this configuration ( but 1 machine, 1 configuration, 1 machine year, it is not sufficient to generalize but it is a good start !)
  • bcache is very performant on this workload and the writeback mode seems to efficiently replace an hardware RAID write-cache (but keep in mind that the reliability on power loss has not been tested)
  • in my personal opinion bcache is underrated, and interesting solution could be packaged using it but notice also that the original author now develops bcachefs (a filesytem based on is bcache work) and doesn't improve bcache anymore
sligor
  • 83
  • 1
  • 5
  • as far as i understand from the linux-bcache ML (linux-bcache@vger.kernel.org), bcache is still not really ready for production and some stability patch applied on recent kernel are surelly not backported to debian official 3.16 kernel. But this project seems to advance well, so I hope it will be production ready in futures releases. – sligor Feb 03 '16 at 15:24
  • Thank you for the update and experiences + setup info shared in your update. However is there a reason why the bcache is not directly contructed with the MD devices, but was instead done with two LVM logical volmes? Could have been done also with the MD Raid1 devices also, right? – humanityANDpeace Oct 09 '20 at 20:16

2 Answers2

4

I looked at your link and went through all the patches, and manually verified that each one was merged in vanilla kernel 4.9.0, with the last patch merged in on 2016-10-27 04:31:17 UTC. That last patch appeared in 4.9.0 released 2016-12-11 19:17:54 UTC. And all of them also appear in the 4.4 kernel available on Ubuntu 14.04 backported from 16.04, linux-lts-xenial_4.4.0-67.88.

And I would not focus too much on the "decreasing cost of SSD storage" since also HDD storage cost decreases too. You can still use both together to save money. Or instead of SSDs, you could get some NVMe instead, which is even faster.

And the rate of corruption from bugs might still not be zero, but even if there are bugs left, the rate is low enough that you don't have to worry if you have backups, which you should have regardless of whether you use caching or RAID.

Peter
  • 2,546
  • 1
  • 18
  • 25
-1

I think that the decreasing cost of SSD storage and the increasing capacity and range of options available make a good case for use using solid-state storage where you need it and foregoing the idea of selective (and potentially buggy) caching.

If you fill in some detail about the environment, the capacity needs and anything else, it may help with a better answer.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 1
    it's mainly for user workspaces: compilation and scripts generating lot of I/O, eclipse CDT generating lot of I/O on big workspaces, "git status" generating lot of I/O on big repositories. I think I will 1) improve my budget to buy bigger SSD 2) give to my user a secondary storage using HDD with 3) set quotas for users on boths spaces and let my users decide where they want to put their files (small quota on fast SSD vs big quota on slow HDD) – sligor Feb 03 '16 at 15:18