I've got mission critical apps running on my VDS. We're not experiencing any visible lag issues from the user perspective. However I want to be cognizant about when it's time to upgrade. The network load is low and I don't believe there are any disk I/O (it's SSD RAID-5) or memory bottlenecks. It's a KVM instance with 2 dedicated CPUs (why Ramnode calls it a VDS and not VPS) and 8GB RAM running Centos 7 and a SugarCRM install with no more than 6 simultaneous users and 6 low-traffic Wordpress sites.

So in my opinion the main thing I need to watch is CPU usage. Below is header from top command and I've set a cron event to output CPU usage every 15 minutes. I know this is a 2-cpu system and it is dedicated CPU because it's a VDS on VPS. Is your opinion the same looking at the CPU usage in 15 minute increments below that the CPU load is well within tolerance for the current instance and no posing any significant bottle neck? I tend to focus on the second and third numbers showing 5-minute and 15-minute average as have a burst in the 1-minute is not as concerning, or do you think the VDS is nearing it's limit on CPU? My thinking is as long as 5-minute and 15-minute numbers are not exceeding 3.0 then I am fine? I know there are other tools (like vmstat) and others that I can use but for a simple quick check, is this information good enough to just check for obvious server overload?

1.81 1.35 1.61 13/448 4598
0.86 1.20 1.33 12/454 10227 
3.88 1.65 1.14 11/480 15646 
4.40 2.90 1.80 7/460 21584 
1.76 1.37 1.49 14/443 27245 
2.01 1.42 1.28 12/454 32656 
3.98 1.86 1.36 9/465 5890 
4.18 2.81 1.86 7/455 11599 
2.57 1.68 1.58 7/453 16947 
1.59 1.43 1.45 10/443 22651

top - 11:54:39 up 20 days, 16:03, 2 users, load average: 0.67, 1.12, 1.31 Tasks: 156 total, 3 running, 153 sleeping, 0 stopped, 0 zombie %Cpu(s): 41.9 us, 6.5 sy, 0.0 ni, 48.4 id, 3.2 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 8010484 total, 613228 free, 2096892 used, 5300364 buff/cache KiB Swap: 1048572 total, 72816 free, 975756 used. 4995584 avail Mem

Gerald Schneider
  • 19,757
  • 8
  • 52
  • 79
  • 31
  • 1

2 Answers2


Its impossible to tell for sure based on just the uptime figures, but if you have really ascertained that disk io is not the bottleneck (It usually is), then this server is running near capacity. I say this because LOAD/CPU COUNT is close to 1. In reality this means that most of the time there is almost 1 job (per CPU) waiting for resource.

I do note that your system is using a fair amount of SWAP according to your TOP output, so I would not be certain your system is constrained by CPU rather then memory/disk IO.

  • 5,964
  • 2
  • 21
  • 38
  • I thought a 100% CPU load is the number of CPUs + 1. So on a 2 CPU system, 100% utilization would be a number of 3. That's the default at least when you run the top command. – TechJaz Oct 16 '19 at 16:40
  • Not sure where you get the +1. (That is not correct). The figure is the number of tasks queued over a time period. I don't understand what you are saying about top. – davidgo Oct 16 '19 at 19:46

Defining a service level objective will be useful for justifying when changes are needed. That quantifies your "not experiencing any visible lag" assertion, regardless of what your infrastructure looks like.

Perhaps you want 99.9% of page loads to be under 100 ms, to maintain the perception of a responsive site. You may want a client side perspective to measure this accurately, such as page speed analytics.

Capacity planning takes a bit of thought. That host looks to have plenty of extra now, but the trick is planning for the future.

Consider any planned organizational growth or expected load spikes. Understand all resources of the system such as with the USE method. Find root cause of events where the service level objective was not met, and if it was a capacity problem.

Have a plan to increase capacity if needed. Know how to scale out or more instances, or up to bigger ones. Put in a load balancer before you need it for performance, and gain high availability as well.

In general, host level metrics of a UNIX/Linux system with capacity left include:

  • Near zero memory page ins
  • ( Load average / CPUs ) less than 1
  • Fast read and write response times to storage devices
  • Near zero drop or overrun network packets
John Mahowald
  • 30,009
  • 1
  • 17
  • 32
  • Oops I was confusing CPUs with logic processors (cores). On a 2 CPU system with dual cores, then 4 would be 100%, not 2. My system is single core (or 2 logical processors) so yes, a 2 would be 100% – TechJaz Oct 17 '19 at 20:10
  • Personally, I count cores and not logical processors in capacity planning. That's not the point of my answer though: there are more host metrics than just CPU related, and they not the important thing which is service level objectives. – John Mahowald Oct 18 '19 at 14:51