Virtual Hosting Cluster File System Confusion

Question

My title probably doesn't encompass the full scope of what I'm needing, so I'll lay out what I want to accomplish.

I have two Linux servers with large drive arrays, multiple CPUs and a large amount of RAM. I have what will be the primary file storage array on a separate RAID card from the OS in each server. The servers also have a 40Gbps Infiniband card for connecting to each other and a 4-port 1Gbps LAN card for connecting virtual machines to the network.

My goal is this:
I want to eliminate some older physical servers by virtualizing their functions (with Qemu-KVM). I want the two large servers to handle the virtual machines, but I want to be able to fail them over. Doesn't have to be automatic. If ServerA has a hardware failure, I want to be able to go into ServerB and start up the virtual servers on that machine and go.

Several of the physical servers I want to virtualize are file servers. I would like the storage arrays on the large servers to act as sort of a SAN for the virtual machines so I don't have to create a virtual drive image to store the files in. I would, of course, like the storage array on ServerA to be mirrored to ServerB, again to provide fail over.

My thought was to use something like Gluster of Ceph to handle the file storage and mirroring of the virtual images from ServerA to ServerB. My confusion comes from information overload. How would the virtual machines access the distributed filesystem on the same host they run on? Would there be a bottleneck? Would the virtual host have to go out the 1Gbps NIC and then loop back, or could they somehow communicate internally? Do I have this all backwards?

I don't expect a step by step answer, but a general recommendation with links to point me in the right direction would be greatly appreciated.

score 4 · Answer 1 · answered Sep 22 '17 at 20:23

For just a pair of nodes neither Ceph nor ClusterFS make any sense. Stick with DRBD and you'll be good.

https://www.linbit.com/en/products-and-services/drbd/

https://www.digitalocean.com/community/tutorials/how-to-create-a-high-availability-setup-with-heartbeat-and-floating-ips-on-ubuntu-14-04

Ceph makes sense with more nodes as you need multiple OSDs to wide stripe your data between multiple rather slow "building blocks". GlusterFS is a little bit "better" from this point of view but not dramatically.

Virtual Hosting Cluster File System Confusion

1 Answers1