12

I'm getting started with virtualization so bear with me.

In virtual environments applications run in a hypervisor's layer. So a single physical machine could have many virtual machines on it running multiple applications.

So far so good?

So what happens when a physical machine fails? Wouldn't that make many applications fail all from a single machine?

I'm searching for developing a private cloud with OpenStack, but I want to fully understand virtualization first.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Sherif
  • 255
  • 2
  • 7

4 Answers4

14

The specifics depend on which exact virtualization solution you use, but the idea is that you have a virtual farm, where there are a number of physical hosts with several virtual machines each. You then use some of the efficiency you gained by not needing a physical host for every VM so that you have enough overhead left to cover in the case where a physical machine goes down.

Additionally, you can locate the VHDs for each VM on a common (redundant) SAN. The hypervisors on each physical host can be set to talk with each other and share memory from different VMs. There is some latency, and much of the memory will be backed by disk, but if one of the physical hosts go down you're not even waiting for the VMs from that host to boot back up. Instead, those VMs will be automatically distributed among the remaining hosts. The ultimate goal is that these machines will pick up from almost where they left off, with little to no downtime at all. In a sense, all of your VMs are already running on at least two physical hosts. In practice, right now hypervisors can only do this kind of migration one machine at a time, when they know it's coming before the host fails... but make no mistake: instant migration on hardware failure is the ultimate goal for all of the major hypervisors.

This is why you sometimes see a server virtualized to a single physical host in a farm. You may not gain any hardware efficiency (you may even lose some performance), but you make up for it in terms of management consistency and built-in high-availability.

Joel Coel
  • 12,910
  • 13
  • 61
  • 99
  • thnx for your answer joel ... I got 2 questions ... does the virtual environment look at the physical machines as a single resource pool? does that help satisfy on-demand self service? Also does vitualization help utilize resources? – Sherif Aug 22 '15 at 22:11
  • 1
    @Sherif: Basically, yes, and yes. If you want to understand this in more detail, have a look at the [Wikipedia article](https://en.wikipedia.org/wiki/Virtualization), it briefly addresses VM migration and failover. If you still have questions, ask a more specific question. – sleske Aug 23 '15 at 11:04
  • 1
    Are you sure about the shared-memory part? From my understanding, a failing VM due to hardware failure will be **restarted** on another host. This can be view as a full reboot or a checkpoint restore, depending on the hypervisor configuration, but the original memory state can not be recovered. For vspere: http://www.vmware.com/products/vsphere/features/high-availability As a side note, some projects were started for KVM to enable _true shared, redundant memory among a collection of hardware hosts_, but they were abandoned. – shodanshok Aug 23 '15 at 18:55
  • 1
    VM migration can only happen if the physical machine have the chance to transfer control before falling. If the physical machine fails unceremoniously, then the virtual machine will have to be restarted on to a different machine. If you have stateless server, this transfer process is trivial, because you can just spin up another machine. For machines with persistent states, you need to have a scheme that can recover the persistent data from the failing physical machine. – Lie Ryan Aug 23 '15 at 21:50
13

All virtual servers running on a physical host will go offline if the host has any sort of failure.

That said, most platforms offer a high-availability solution for a single VM. Other times a system is built with multiple nodes to prevent service disruption in the event that one node goes down.

If two VM nodes make up a highly available service, it is possible to configure the hyper visor to ensure that the two nodes are not reliant on the same physical infrastructure (fault tolerance). This could be more than just physical server fault tolerance, including different network paths, all the way up to geographically disparate location.

blaughw
  • 2,242
  • 1
  • 10
  • 17
  • 2
    AWS for instance, depending on the service, replicates the service across availability zones (physical areas) in case there's a natural disaster to that area that'll disrupt the physical machines. – Michael Bailey Aug 22 '15 at 21:51
  • does the virtual environment look at the physical machines as a single resource pool? does that help satisfy on-demand self service? Also does vitualization help utilize resources? and thnx a lot for your efforts – Sherif Aug 22 '15 at 22:31
5

You are right with your assumption that if the physical machine fails also the VMs get unavailable.

But openstack can take care of that and start the VMs of the failed physical server on a other server or you can use a hypervisor system which is already distributed, I think vsphere can do that.

You should read the openstack documentation on HA for more information.

Henrik Pingel
  • 8,676
  • 2
  • 24
  • 38
2

Regarding your question - yes, you will loose access for all machines within this physical host. Of course, it depends which component failed. If it is disk - it is kind of problem, if it motherboard - it is much easier. In general hardware recovery is easier as hypervisor is hardware-agnostic. At this point of time there are a lot of vendor specific technologies you can use to have highly available services.

Resource Pools (vmware) - are NOT able to aggregate multiple physical host resources (cpu,memory,etc) as somebody mentioned above, so if you have 2 physical host (let's say 1CPU quad cores without hyperthreading - 8GBRAM each) it will NOT be possible to have 5vCPU-12Gb VM there. Resource pools are logical ones, they are not able create supercomputing systems. Right now, this is a way of controlling resource utilization.

Availability (vmware) - it is possible to use technologies like High Availability (HA) which allow you to have automated recovery (based on my experience within 1-2min) of all VMs in cluster automatically, IF you are using Storage Array (NAS,iSCSI,FC) and keep all VM files there. More over HA works only in case CPU, RAM, Motherboard failure, it is obvious it will will not work of Storage Array goes down. To prevent RAID/Controllers failures people use Replication, Storage LUNs mirroring etc.

If recovery within 1-2 min is not an option there are technologies like Fault Tolerance (FT) which allow to achieve ZERO downtime of VM in case of failure by keeping shadow(running) copy of configured VM. But this technology also has a lot of restrictions - problem of fault tolerating VMs with multiple vCPUs is not fully solved.

Overall, each solution depends on your goal.

Dmitry S
  • 231
  • 3
  • 6