1

I am running KVM with some Ubuntu VM's as guest machines. The guest machines contain an application that does not need to be run most of the time, but once every few months, there are unexpected, random triggers that require it to be run immediately (<5 second delay) for just a few hours.

If I keep the VM always running, I waste a lot of CPU resources, because the VM is mostly inactive 99.99% of the year.

If I hibernate the VM state into disk, starting the application would require booting the VM up, which takes too long on my machine (minutes).

I'd like to pause/suspend the VMs into memory, because resuming the VM seems instantaneous. And while the VM is inactive, I can re-use the CPU resources elsewhere (although I understand that I cannot re-use the memory).

Is it recommended to pause guest VMs for long periods of times (months or years)? Will it be reliable to resume? What are best practices to make sure it will resume normally when I need it months later?

I was thinking of buying ECC ram for the host machine to protect against random bit flips. But is there anything else I should be doing?

user3667125
  • 339
  • 2
  • 6
  • This app sounds like a good candidate for containerization. – Michael Hampton Jul 17 '20 at 00:15
  • 1
    It does, but unfortunately the guest applications are monolithic and have not been containerized; and also some of them require kernel-level modification that VMs support but containers do not. – user3667125 Jul 17 '20 at 00:22
  • 1
    @user3667125 I understand what you're trying to achieve, but if the machines are mostly idle, are you really wasting a lot of CPU resources compared to what's available to you? In most cases I would accept this as the cost of running the system in question, unless as Michael Hampton suggested the load can be containerized. – Mikael H Jul 17 '20 at 06:55

1 Answers1

2

No, leave the VM running.

While paused, you cannot maintain the application or the OS instance. At minimum, security updates every couple months.

Already running will be faster than resume. Better than 5 seconds does not leave a lot of time for delay.

Speaking of time, the time is probably wrong in the guest. Not obvious how to address this for the resume case, see How to keep time on resumed KVM guest with libvirt?

Resume does not save you resources. Storage and RAM is already spent. CPU you can overcommit a little. In other words, assume the idle CPU of this guest - and it idles most of the time - is available other guests on the host.

Consider peak use in your capacity planning: what happens when it runs on top of typical workload? Buy CPU for your compute hosts when necessary. Sometimes that is the price for maintaining a fast response time.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32
  • I disagree with the given reasons. You can always unpause to upgrade the guest then pause it again (perhaps after a restart, if such is needed for the upgrade). On the other side, guests are never fully idle: they process timer (and other) interrupts, keep the clock, there’s usually cron running, and VMs with a GUI booted, especially with a desktop environment, are real resource hogs even when “idle”. – mirabilos Feb 14 '22 at 20:05
  • Please make your own answer to this question, in particular how you will meet the 5 second response time objective. – John Mahowald Feb 14 '22 at 20:22