25

I've been using VMWare for many years, running dozens of production servers with very few issues. But I never tried hosting more than 20 VMs on a single physical host. Here is the idea:

  1. A stripped down version of Windows XP can live with 512MB of RAM and 4GB disk space.
  2. $5,000 gets me an 8-core server class machine with 64GB of RAM and four SAS mirrors.
  3. Since 100 above mentioned VMs fit into this server, my hardware cost is only $50 per VM which is super nice (cheaper than renting VMs at GoDaddy or any other hosting shops).

I'd like to see if anybody is able to achieve this kind of scalability with VMWare? I've done a few tests and bumped into a weird issue. The VM performance starts degrading dramatically once you start up 20 VMs. At the same time, the host server does not show any resource bottlenecks (the disks are 99% idle, CPU utlization is under 15% and there is plenty of free RAM).

I'll appreciate if you can share your success stories around scaling VMWare or any other virtualization technology!

user9517
  • 114,104
  • 20
  • 206
  • 289
  • 4
    What VMware product are you planning on using? ESX? ESXi? Server? – wzzrd Jun 22 '09 at 20:45
  • 2
    You can run XP with 256 without much difficulty, especially if it is light duty tasks. Microsoft requires 64 but 128 is "sufficient" http://technet.microsoft.com/en-us/library/bb457057.aspx – Matt Rogish Jun 22 '09 at 20:56
  • 1
    where are you buying your servers from? I want one :) – warren Jul 01 '09 at 13:14
  • 1
    5000 USD only, can you sell me two? :) – Taras Chuhay Aug 04 '09 at 16:08
  • You have "this amount of cpu" in your hosting server, and each VMs will get a share of it. Plus esxi will have overhead : "switch to this VM, manage it, switch to the next, etc", many times per second. It means each VM will get only a fraction of the total cpu. The more VMs, the more you divide your cpu (and the more overhead you also add, which means instead of having 100 vms, you in fact have quite a bit more). – Olivier Dulac Sep 24 '15 at 17:15

10 Answers10

15

Yes you can. Even for some Windows 2003 workloads as little as 384MiB suffices, so 512MiB is a pretty good estimation, be it a little high. RAM should not be a problem, neither should CPU.

A 100 VMs is a bit steep, but it is doable, especially if the VMs are not going to be very busy. We easily run 60 servers (Windows 2003 and RHEL) on a single ESX server.

Assuming you are talking about VMware ESX, you should also know that is able to overcommit memory. VMs hardly ever use their full appointed memory ration, so ESX can commit more than the available amount of RAM to VMs and run more VMs than it actually 'officially' has RAM for.

Most likely your bottlenech will not be CPU or RAM, but IO. VMware boasts huge amounts of IOPS in their marketing, but when push comes to shove, SCSI reservation conflicts and limited bandwidth will stop you dead way before you'll come close to the IOPS VMware brags about.

Anyway, we are not experiencing the 20 VM performance degradation. What version of ESX are you using?

wzzrd
  • 10,269
  • 2
  • 32
  • 47
  • Thanks Wzzrd! I am currently using VMWare Server 2.0, but planning to try ESX very soon. I've been watching I/O on all host arrays very carefully, and the only way I was able to max it out is by rebooting multiple guests at a time. When the guests are doing light workload or staying idle, the host disks are 99% idle. So, I am suspecting that something else than CPU and IO is causing all the VMs to slow down. By the way, they slow down dramatically - it takes 20 seconds to open the Start menu, and if I run Task Manager inside a VM, the task manager takes 90% CPU - weird! – Dennis Kashkin Jun 22 '09 at 22:24
  • 2
    That would because you are using VMware Server. VMware Server is a virtualization platform on top of another platform (Linux, most often), while ESX is a bare metal virtualization platform. Very different, both in concept as in the way it performs. – wzzrd Jun 23 '09 at 06:50
  • Sadly when patch day comes with 100 vm's you WILL be rebooting a lot of the mat the same time ;) And patching itself is hard. Beware a service pack - that is when the real pain starts ;) – TomTom Feb 09 '12 at 06:07
  • Stop fooling yourselves about thinking bare metal is something special. ESXi is just a stripped down Linux. Yes, Linux. – dresende Jun 11 '12 at 22:47
  • 2
    @dresende. No, it isn't. Trust me. – wzzrd Jun 12 '12 at 08:57
  • I'll trust if you explain the host.system.kernel.kmanaged.LinuxTaskMemPool here: http://d.pr/i/q4vG – dresende Jun 13 '12 at 15:47
11

One major problem with a large environment like that would be disaster prevention and data protection. If the server dies, then 100 VMs die with it.

You need to plan for some sort of failover of the VMs, and to plan for some sort of "extra-VM" management that will protect your VMs in case of failure. Of course, this sort of redundancy means increased cost - which is probably why many times such an outlay is not approved until after its benefits have been seen in practice (by its absence).

Remember, too, that the VM host is only one of several single point-of-failures:

  • Network - what if the VM host's networking card goes down?
  • Memory - what if a chunk of the VM host's memory goes bad?
  • CPU - if a CPU core dies, then what happens to the VMs?
  • Power - is there only one - or two - power cables?
  • Management port - suppose you can't get to the VM's host management?

This is just a few: a massive VM infrastructure requires careful attention to prevention of data loss and prevention of VM loss.

Mei
  • 4,560
  • 8
  • 44
  • 53
  • 2
    Listen to David. You will want an N+1 configuration, meaning you need at least one spare idle machine that is capable of absorbing all of the workload another machine should it fail. My recommendation is a two-server cluster that distributes the load evenly but could independently handle all workload should one machine fail. – Jason Pearce Feb 16 '11 at 01:44
4

No statement on the viability of this in production, but there is a very interesting NetApp demo where they provision 5440 XP desktops on 32 ESX hosts (that's 170 per host) in about 30 minutes using very little disk space due to deduplication against the common VM images

http://www.youtube.com/watch?v=ekoiJX8ye38

My guess is your limitations are coming from the disk subsystem. You seem to have accounted for the memory and CPU usage accordingly.

Kevin Kuphal
  • 9,064
  • 1
  • 34
  • 41
3

Never done it - but I promise you'll spend much more than on storage to get enough IOPs to support that many VM's than you will on the server hardware. You'll need alot IOPs if all 100 of those are active at the same time. Not to sound negative but have you also considered you're putting a lot of eggs in one basket(sounds like you're after single server solution?)

Jeff Hengesbach
  • 1,762
  • 10
  • 10
  • 2
    I would definitely create multiple "baskets" and set up some automated backups. I/O bottlenecks can be easily solved with SSD drives these days. I've been using 160GB Intel MLC drives on production and they are spectacular. You basically get 5 times better random I/O performance than top of the line SAS drives (in simple RAID configurations). – Dennis Kashkin Jun 22 '09 at 22:39
1

I would be most worried about CPU contention with 100 VMs on a single host. You have to remember that the processor is NOT virtualized so each machine will have to wait for access to the cpu. You can start to see contention by looking at ESXTOP i have been told anything over 5 in the %RDY field is very bad by VMWare Engineers.

In my experience i've seen about 30 - 40 servers running on one host (not doing too much).

Zypher
  • 36,995
  • 5
  • 52
  • 95
1

I had 10 Hosts on VMWare Server 1.0.6 (under Windows 2003) and it would run into IO issues on a regular basis (and if the nightly builds ever overlapped with something else, then they would have issues). After upgrading from Windows to ESXi U3, we found that our performance problems went away (nightly builds no longer failed).

Also note that while SSDs have a much higher IO rate than spinning media, there are some cases where that doesn't hold, such as certain types of write patterns (lots of small writes scattered across the drive will kill performance unless the controller has a smart write buffering cache that does a good job on scatter writes).

I'd recommend investigating/testing having the SWAP files on different drives if you run into issues.

Walter
  • 1,047
  • 7
  • 14
1

If you're going to do that then I'd strongly urge you to use the new Intel 'Nehalem' Xeon 55xx series processors - they're designed to run VMs and their extra memory bandwidth will help enormously too. Oh and if you can use more, smaller disks than few, big ones - that'll help a lot. If you can use ESX v4 over 3.5U4 too.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
1

I've 20 something XP VMs running with 512M of ram each on a machine with 16G of ram. Less than this and they swap onto disk and that gives the bottleneck. These are always active XP VMs though.

VMware and its OverCommit feature should allow you to push more ram to each XP machine. Similar machine will share the same pages so could reduce disk writing. It is something I'd like to look into for our setup to try add more machines as our XP VMs are doing 10-20meg of continuous disk traffic.

Ryaner
  • 3,027
  • 5
  • 24
  • 32
1

We were unable to achieve 100 happy guests on VMWare Server, but then found that ESXi is doing a much better job. So, it appears that 100 XP vms is not a problem if you use ESXi and a decent server (a few disk mirrors to spread the I/O, a couple of I7 chips and 64GB of RAM). There is no visible delay for end users and the host resources are not maxed out (the hottest one is CPU but it's typically at least 70% idle).

PS. This question was posted by me back when we were struggling with VMWare Server.

Dennis Kashkin
  • 391
  • 3
  • 5
0

Last time I checked, VMware recommends no more that 4 VM's per processing core for ESX, assuming one vCPU per VM.

This suggests management overheads becoming a factor.

I'm very interested to see if you can actually achieve a 4x factor on an 8 core box.

Hans Malherbe
  • 725
  • 2
  • 9
  • 11
  • 1
    That's pre ESX 3.5U2 then - the config maximums doc for update 2 says 8 for general purposes but that increases to 11 for VDI workloads. I'm pretty sure I saw something that I can't find off hand that increased that VDI recommendation to 19 with Update 3 or 4. For vSphere that limit is now 20. Search for VMware ESX Configuration Maximums for the official documents from VMware. – Helvick Jun 22 '09 at 22:40
  • My VMs stay idle most of the times. People connect maybe a few times a day to run some lightweight software. I have confirmed that these VMs create very small CPU overhead on the host when they are idle (20 VMs add up to 9% CPU utilization based on dual quadcore system). Would you be able to remember how the four VM per CPU limit is justified? Are they thinking about web servers or desktop OS instances? – Dennis Kashkin Jun 22 '09 at 22:42