6

My server infrastructure is growing fast and I decided to create a distributed storage cluster. I've been looking for a proper filesystem for this task which meet my requirement, but none of them supports a local disk cache functionality. Each of my servers have a two 600GB SAS hard drives and I like to use them as a cache storage for most frequently accessed files from distributed storage.

Is any opensource filesystem supports this functionality? I like to use Ceph or GlusterFS, but I not found anything about local disk cache. I think it is a one of the basic features that distributed filesystem should support.

Galmi
  • 111
  • 1
  • 7
  • 2
    More detail is needed on what you're doing. Why is caching needed? Are you compensating for lacking infrastructure? Can you outline the details of the networking interconnects and your application's I/O requirements? – ewwhite Dec 01 '13 at 22:52
  • looks like ceph gets cachefs support http://www.redhat.com/archives/linux-cachefs/2013-September/msg00022.html – kofemann Dec 04 '13 at 20:38

9 Answers9

8

Check out OpenAFS it has local disk cache, see: http://docs.openafs.org/Reference/5/afs_cache.html

Stone
  • 6,941
  • 1
  • 19
  • 33
6

Another contender is XtreemFS: the feature set includes

In addition to full replicas that contain a complete copy, XtreemFS also supports partial replicas. These replicas are filled on demand when a client accesses data.

Andrew
  • 7,772
  • 3
  • 34
  • 43
3

As per comments elsewhere - it would be possible to use local disk storage for caching gluster I/O albeit at the cost of VFS cache, AFS seems to be apropriate. But the big omissions from your question is whether you are trying to achieve fault-tolerance or performance, and whether the replicated storage should support transactions or frequent writes.

Other options include

  • using a replicating nosql database
  • bcache (which will provide performance improvements but not resillience improvements and poses problems with frequent writes / cache consistency)
  • NAS/SAN
symcbean
  • 19,931
  • 1
  • 29
  • 49
1

OpenAFS does have a local file cache, but so does NFSv4 with the appropriate configuration.

http://www.cyberciti.biz/faq/centos-redhat-install-configure-cachefilesd-for-nfs/

However, unless your file access is largely read-only, caching may buy you much less performance than you might expect. In situations with many clients attempting to write to the same servers, it can actually decrease performance.

1

What about flashcache and ceph?

http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/

hookenz
  • 14,132
  • 22
  • 86
  • 142
1

IPFS is worth looking into, even though it's still relatively young and performance isn't on par with Ceph or GlusterFS yet.

I recommend it because the design is exactly what you need for an efficient local cache. All content (including the directory structure) is immutable and addressed by a cryptographically verifiable hash. This means content can be retrieved from anywhere - in memory, on disk, or even an untrusted peer. Plus, you get deduplication for free.

When looking up a file by hash, there is no need to contact a remote server to learn of updates, and no need to handle cache invalidation for anything other than freeing up disk space. Mutable addresses can be had with IPNS, but those are just pointers to file hashes, meaning that only a single request is needed to find out of an updated file tree.

slang
  • 115
  • 5
1

You can try MooseFS distributed file system. In MooseFS Master machine the full file system structure is cached in RAM for better speed.

TechGeek
  • 161
  • 2
0

in the opensource field the ceph cache tier is close, even though it does not really care about locality. lizardfs tried to prefer local chunkservers (via a flag) - but afaik there was caveats. in the commercial field, i.e. Amplidata had this kind of functionality from the start, but with the rise of opensource sds, it has (to my impression) been unable to gain traction - even though the alternatives were subpar in this aspect.

Florian Heigl
  • 1,440
  • 12
  • 19
0

It looks like gluster performs local file caching. Some of the tunable values are

Option  Description     Default Value   Available Options 
performance.cache-size          Size of the read cache.     32 MB   size in bytes
performance.cache-max-file-size     Sets the maximum file size cached by the io-cache translator. Can use the normal size descriptors of KB,MB,GB,TB or PB (ie. 6GB). Maximum size uint64.  2 ^ 64 -1 bytes     size in bytes
performance.cache-min-file-size     Sets the minimum file size cached by the io-cache translator. Values same as "max" above.   0B  size in bytes
performance.cache-refresh-timeout   The cached data for a file will be retained till 'cache-refresh-timeout' seconds, after which data re-validation is performed.  1 sec   0 < cache-timeout < 61 
sciurus
  • 12,493
  • 2
  • 30
  • 49
  • 1
    I am afraid this is a kind of in-memory cache. Default value of 32MB points to It. – Veniamin Dec 02 '13 at 19:05
  • So? It'll get paged to the (local?) swap space if it gets too large to fit in RAM (albeit that this will still force the VFS cache to shrink). (But based on the very limited amount of information provided I'd say AFS would be top of my list of things to consider) – symcbean Dec 03 '13 at 14:37