How does migrating a running Virtual-Machine from one Hypervisor to another actually work?

Question

How does migrating a running Virtual-Machine from one Hypervisor to another in a cloud environment actually work?

I know that it is possible and that this feature exists. But what does actually happen when a running Virtual-Machine is being migrated from one Hypervisor to another? I have a hard time imagining that this is possible without the slightest interruption?

While I may imagine that it might be somehow achieved if the bare metal is physically very close I struggle with the scenario to migrate a running virtual machine from an US data center to an EU data center for example.

Can anyone explain what is actually happening and how it is being achieved that there is no perceivable down time but remaining consistency?

For example here: http://www.vmware.com/products/vsphere/features-vmotion I understand it high-level but would like to deeper understand how it is technically achieved. — binaryanomaly, Apr 27 '14 at 21:59

score 6 · Accepted Answer · answered Apr 27 '14 at 22:51

Conceptually, the process is simple: start copying all the RAM for a VM from one physical host to another over the network, keeping track of which memory sections you've already copied have been updated after you copied them. Repeat the cycle for the changed RAM until the change set is small, pause the VM, copy the last bits of RAM (and the CPU register state), and begin running the VM on the new physical host. Send a gratuitous ARP so the network sees the new location of the VM, and away you go. It's not 100% transparent -- between the pause and the ARP delay, the VM will be offline for about a second.

In order for this to work, the memory IO rate has to be less than the speed of the link between hosts (which is why it's preferred to use a direct 10GigE link for VMotion, without routers between the hosts), and the latency between hosts needs to be pretty short -- VMware requires 5ms round-trip.

The origin and destination hosts need to have the same underlying disk storage, which generally limits you to hosts in the same room. You can can do synchronous storage replication to go farther, but at the expense of slower performance normally (as each disk operation has to go to the far end and receive confirmation).

There are ways to migrate services over longer distances (even between continents), but those are approaches to switch where the active instance of an application is (and how you reach it), not live-migrating the VM itself.

Thanks a lot for your answer. While I understand that it's possible to do this there are also a lot of limiting factors that restrict this process to "ideal" conditions. Which means for example that this undertaking might be rather difficult to impossible for a VM that is under constant high load. Your explanations helped a lot to point that out, thanks. — binaryanomaly, Apr 28 '14 at 05:32
If you didn't have live migration possible, is the alternative a rolling switch? — CMCDragonkai, Jun 12 '15 at 17:41

How does migrating a running Virtual-Machine from one Hypervisor to another actually work?

1 Answers1