The problem with what you're trying to do is that the VFS cache is controlled entirely in-kernel, and your problem space is a very niche one -- in general, putting cache into swap entirely defeats the purpose of cache (although I agree your use case is a valid one). My point is simply that it's very unlikely that what you want to do is currently supported in the kernel (and I've certainly never heard any sort of rumblings about what you want to do being possible).
If you were running a more "pseudo" virtualisation technology, such as qemu, you would be able to "overcommit" the memory used by the VMs. In this way, the memory used by the VM would be more visible to the host as "regular" process memory, and you could then use the host's swap space to page that out when it wasn't needed. This runs the risk of swapping the machine to death if processes in the VMs actually needed all that memory, or if cache pressure in the VMs was strong, but it could work with some careful tweaking.
Any attempts to manage this sort of thing in userspace are unlikely to work, because the VFS caching is all kernel level, and (again) use-cases for managing it in userspace are very niche. If it were application data you were trying to cache, you could provide a userspace cache to the data (if you needed it in a filesystemy way you could use FUSE, but an application-specific datastore would work better), but that's both a lot of work (caching isn't simple to get right), and won't work when it's the root filesystem you need cached.
If you do decide this is worth your while, I think you're going to spend a lot of time writing and debugging your own kernel-level code to support this use-case. Rather than generalising the problem to "store cache in swap" (which is going to get a lot of people's defenses up straight away), what might be easier is some sort of "SAN device caching" mechanism which uses swap space rather than VFS in-memory cache. Note I say "easier", and not "easy" -- it's still going to be a lot of work.
I'd be willing to spend a lot of effort (and money) on improving the performance of my NAS/SAN before I looked into modifying the kernel -- because, quite honestly, it'll provide more bang-for-buck than caching. With caching, your initial access is always going to be as slow as the underlying access mechanism, and if you can make that quicker, it'll probably improve percieved performance more than having slow initial access with fast (infrequent) repeated access. Also consider the cost of just giving all your VMs a whole bucketload more RAM -- you can buy a lot of RAM for the cost of a month or two's kernel hacking.