23

I'm using ubuntu 20.04 with Xen Hypervisor. On my machine I have an SSD that hosts my VM images, and then four sata drives which I have data on. My current set up is to mount the data on my domain0 and then provide that data to the other VMs over network file server.

This seems inefficient as all the VMs would have to go though my NIC in order to access the data. Am I correct in that asusmption that this is a large bottleneck?

What's the industry standard for providing data that is within the same physical machine? Any advice or improvements to this setup?

Is there harm in mounting the data LVM on each of the VMs? My concern with this approach is what would occur if two VMs try to access the same data point simultaneously? Is this setup vulnerable to data corruption?

data map of server setup

curios
  • 353
  • 2
  • 9

2 Answers2

38

In general, no, unless you meet one of two very specific constraints. Either:

  • The device needs to be exposed read-only (this MUST be at the device level, not the filesystem level) to all the VMs, and MUST NOT be written to from anywhere at runtime.

or:

  • The volume must be formatted using a cluster-aware filesystem, and all VMs must be part of the cluster (and the host system too if it needs access to the data).

In general, filesystems that are not cluster-aware are designed to assume they have exclusive access to their backing storage, namely that it’s contents will not change unless they do something to change them. This obviously can cause caching issues if you violate this constraint, but it’s actually far worse than that, because it extends to the filesystem’s internal structures, not just the file data. This means that you can quite easily completely destroy a filesystem by mounting it on multiple nodes at the same time.

Cluster-aware filesystems are the traditional solution to this, they use either network-based locking or a special form of synchronization on the shared storage itself to ensure consistency. On Linux, your options are pretty much OCFS2 and GFS2 (I recommend OCFS2 over GFS2 for this type of thing based on personal experience, but YMMV). However, they need a lot more from all the nodes in the cluster to be able to keep things in sync. As a general rule, they have significant performance limitations on many workloads due to the locking and cache invalidation requirements they enforce, they tend to involve a lot of disk and network traffic, and they may not be feature-complete compared to traditional single-node filesystems.


I would like to point out that NFS over a local network bridge (the ‘easy’ option to do what you want) is actually rather efficient. Unless you use a rather strange setup or insist on each VM being on it’s own VLAN, the NFS traffic will never even touch your NIC, which means it all happens in-memory, and thus has very little in the way of efficiency issues (especially if you are using paravirtualized networking for the VMs).

In theory, if you set up 9P, you could probably get better performance than NFS, but the effort involved is probably not worth it (the performance difference is not likely to be much).


Additionally to all of this, there is a third option, but it’s overkill for use on a single machine. You could set up a distributed filesystem like GlusterFS or Ceph. This is actually probably the best option if your data is not inherently colocated with your VMs (that is, you may be running VMs on nodes other than the ones the data is on), as while it’s not as efficient as NFS or 9P, it will give you much more flexibility in terms of infrastructure.

Austin Hemmelgarn
  • 2,070
  • 8
  • 15
  • In my experience, while 9p (plan 9 FS) does solve the shared-FS problem with VMs, it is not worth the extra effort vs. setting up NFS. But if the VMs can't be networked, 9pfs is probably the only option. – zymhan Nov 02 '20 at 17:35
  • 1
    @zymhan agreed, no matter whether it’s over a network layer, or VirtIO, or even something else, 9P is usually not worth the effort to set up unless you need every last ounce of performance possible and can’t bake the distributed aspect directly into your application. I simply mentioned it in my answer for the sake of completeness. – Austin Hemmelgarn Nov 02 '20 at 17:38
  • 9p's built-in support in qemu these days may mean it's (vastly) less of a pain to set up than this implies. If you're using qemu for your VMs, it's arguably _easier_ to use 9p than anything else. See the examples @ https://www.linux-kvm.org/page/9p_virtio and, when it's back up later, https://wiki.qemu.org/Documentation/9psetup – Charles Duffy Nov 05 '20 at 00:30
  • @CharlesDuffy But only if you’re using Linux, and the kernel has 9P support, and the 9P support is built with VirtIO transport support. Even then though, there are limits due to needing to keep user ID’s sanely mapped between the systems if you plan to share across multiple VMs. Yes, it’s easier than trying to run 9P over the network, but NFS is even easier in many cases, and a lot more flexible. – Austin Hemmelgarn Nov 05 '20 at 02:53
19

If you don’t run cluster-aware file system inside your VMs you’ll simply destroy your shred volume immediately after first metadata update. Full clarification story is here:

https://forums.starwindsoftware.com/viewtopic.php?f=5&t=1392

BaronSamedi1958
  • 12,510
  • 1
  • 20
  • 46