0

We are using an application with about 300 concurrent users. Now everything is virtualized: 1 VM as a load balancer, 2 VMs as web-server, (on this ESXi host there are additional +25 other VMs) and 1 server (bare metal) as SQL Server. We have some issues with the performance and decided to buy physical hardware to boost it up.

I'm not sure, how we can get better performance?:

  • we buy 1 rack Server hardware and run ESXi with just all 3 VMs above,

  • we buy 1-1 rack Server hardware for the web servers and install the Windows server just with the application. (and leave the load balancer as before - VM)

  • we buy 3 rack servers for the load balancer and for the 2 web servers.

Users are connected with web interface / desktop app to the server.

Thank you for your help, drewo

drewo
  • 3
  • 1

2 Answers2

0

Some pieces of information you should find the answer to before deciding on a path forward:

CPU usage for the affected VMs

  • From the guest operating system's view, is CPU usage often above 80%, and/or shows plateaus rather than spikes in usage? It's likely your VM is CPU starved. Add more vCPUs (but think about possible licensing issues).
  • Are some vCPUs in your servers significantly less loaded than others? You could have a scaling problem in your application, where simply throwing more vCPUs into a single VM (or into a physical machine) won't help matters.
  • Do the CPU ready times indicate that the host has been overcommitted? A rule of thumb you sometimes see is that you want less than 5% average ready time, but my experience is that even that is way too much for a system you actually do work in. Note that if you use vCenter, the indicated ready time is in aggregate milliseconds since the last graph update. In "realtime" view, the graph updates every 20 seconds (=20000 ms), so the average percentage per CPU for a VM can be calculated using the formula (indicated_ready_time * 100 / 20000) / number_of_vcpu.

RAM usage

(Should always be checked from within the guest operating system)

  • Usually above 80%? Add memory.
  • Signs of memory leaks? Fix the application or be prepared to restart/reboot more often.
  • Signs of heavy swapping? Check for configuration issues. Add memory.
  • Do you have key applications/processes that "inexplicably" use less than 4 GB of memory? They may need to be rebuilt or reconfigured to utilize 64-bit addressing.

Also check disk and network performance for latency issues.

Depending on how your application scales it might be an idea to add more web servers rather than to add compute power or memory to the existing ones.

Once you have an idea of where your bottlenecks are and how best to utilize your hardware, you can start making a business case for what to purchase.

The main case for virtual machines is that they are easier to manage, easier to backup and easier to migrate in case of system failure. They allow for better utilization of your hardware, provided that they don't actually require all resources you can throw at them, and if you use paravirtualized network interfaces the communication between machines on the same host is as fast as the CPU can manage rather than being limited to physical network interface speeds.

A system running directly on a physical machine will, of course, have no overhead due to resource sharing, but this is only a benefit if you can use the available power.

Mikael H
  • 4,868
  • 2
  • 8
  • 15
0

Without investigation into the cause of your performance problems and knowledge of your application you can’t tell what will be the easiest / best remediation.

In the case that your problem is indeed a lack of hardware resources monitoring should make it pretty clear where you’re hitting limits now and what to buy though (CPU cores or CPU speed , more RAM memory, faster disks) and where to assign that.

In my experience more than half the performance issues were more or less easily solved by proper tuning rather than throwing more hardware resources at the problem. Most developers and too many vendors don’t have the ability or resources to test their applications and database backend with both the same amount of data and similar load to what you get in production and they make assumptions and design choices that won’t scale too well in practice.

Careful monitoring will give you an insight in what your bottlenecks are and what you may need to address in the application, database or at the hardware level.

Please note that performance analysis and tuning are as much a science as well as a black art.

Very common application issues that can be easily addressed and often provide significant benefits are

  • missing indexes in a database
  • connection pooling and query caching
  • tuning of memory limits, number of connections, sockets and concurrent threads of applications

Flaws in application design that are more difficult to address are where too much data processing logic is in the application front-end and database queries are too simple, unbound and return too much data (such as with a SELECT * from GrowingDataSet ) - in your monitoring the symptoms for such can be as diverse as high disk IO load on the database server - high memory consumption on the application server - saturated network links - each of which can be used to support a different hardware purchasing decision ( upgrade to SSD’s in the database server - increase RAM in the application server - upgrade the network) probably none of which are needed when the application starts applying better logic and pagination in the queries.

Bob
  • 5,335
  • 5
  • 24