2

Background: The Linux VFS cache will cache every file that is read from the disk in memory. This keeps going until RAM is full, at which point the oldest files get kicked from the cache.

I am running XenServer with network attached storage. My servers have their root filesystem on the network attached storage. I also have a local disk in each XenServer host. My servers have their swap partitions on this local storage. In my specific case the network attached storage is loaded and disk io can be pretty slow. I have things setup like this because the local disks are not using RAID or protected in any way. My system can tolerate a local disk failure because all I will lose is my swap partitions.

I am wondering if anybody knows how to instruct Linux to fill up the swap partition (in addition to RAM) with cached files? With a physical server using all local disks this gives no speed benefit, but for my servers it makes a lot of sense.

womble
  • 95,029
  • 29
  • 173
  • 228
benathon
  • 472
  • 2
  • 12
  • I think this is unlikely to be possible, although I can see the value in it and why you'd want to do it. – womble Aug 24 '11 at 21:45
  • Great, thanks for your input. Maybe there's a program or daemon that provides extra file caching capabilities under Linux? – benathon Aug 24 '11 at 22:10
  • I don't like your chances -- it's really a kernel-level issue. With other virtualisation technologies I could see ways of hacking this in, but Xen is it's own little world. I think if you want this, you're going to have to do some (pretty serious) kernel modifications. – womble Aug 24 '11 at 22:12
  • Out of curiosity, can you elaborate on your statement "ways of hacking this in?" What if I was using VMWare? – benathon Aug 24 '11 at 22:19
  • No, vmware wouldn't help. You'd want to use one of the pseudo-virtualisation systems, where the host has the ability to overcommit memory and swap guest "physical memory" into it's own swap space. With enough tuning that'd kinda-sorta work, as long as there's no real memory pressure from actual processes (at which point the whole thing would swap itself into a stupor). – womble Aug 24 '11 at 22:22
  • ok great! if you make an actual post (and not a comment) i will accept it. Thanks for the info :) – benathon Aug 24 '11 at 22:37

1 Answers1

3

The problem with what you're trying to do is that the VFS cache is controlled entirely in-kernel, and your problem space is a very niche one -- in general, putting cache into swap entirely defeats the purpose of cache (although I agree your use case is a valid one). My point is simply that it's very unlikely that what you want to do is currently supported in the kernel (and I've certainly never heard any sort of rumblings about what you want to do being possible).

If you were running a more "pseudo" virtualisation technology, such as qemu, you would be able to "overcommit" the memory used by the VMs. In this way, the memory used by the VM would be more visible to the host as "regular" process memory, and you could then use the host's swap space to page that out when it wasn't needed. This runs the risk of swapping the machine to death if processes in the VMs actually needed all that memory, or if cache pressure in the VMs was strong, but it could work with some careful tweaking.

Any attempts to manage this sort of thing in userspace are unlikely to work, because the VFS caching is all kernel level, and (again) use-cases for managing it in userspace are very niche. If it were application data you were trying to cache, you could provide a userspace cache to the data (if you needed it in a filesystemy way you could use FUSE, but an application-specific datastore would work better), but that's both a lot of work (caching isn't simple to get right), and won't work when it's the root filesystem you need cached.

If you do decide this is worth your while, I think you're going to spend a lot of time writing and debugging your own kernel-level code to support this use-case. Rather than generalising the problem to "store cache in swap" (which is going to get a lot of people's defenses up straight away), what might be easier is some sort of "SAN device caching" mechanism which uses swap space rather than VFS in-memory cache. Note I say "easier", and not "easy" -- it's still going to be a lot of work.

I'd be willing to spend a lot of effort (and money) on improving the performance of my NAS/SAN before I looked into modifying the kernel -- because, quite honestly, it'll provide more bang-for-buck than caching. With caching, your initial access is always going to be as slow as the underlying access mechanism, and if you can make that quicker, it'll probably improve percieved performance more than having slow initial access with fast (infrequent) repeated access. Also consider the cost of just giving all your VMs a whole bucketload more RAM -- you can buy a lot of RAM for the cost of a month or two's kernel hacking.

womble
  • 95,029
  • 29
  • 173
  • 228