2

My M4.large instance somehow reported slight stolen CPU. However, if I didn't remember wrong. It should merely occurs on T2 or M3 series instances.

top - 11:07:53 up 24 min,  2 users,  load average: 1.00, 1.00, 0.80
Tasks:  89 total,   2 running,  87 sleeping,   0 stopped,   0 zombie
Cpu0  : 29.0%us, 62.6%sy,  0.0%ni,  8.0%id,  0.1%wa,  0.0%hi,  0.0%si,  0.3%st
Cpu1  :  0.1%us,  0.0%sy,  0.0%ni, 99.4%id,  0.1%wa,  0.0%hi,  0.0%si,  0.4%st

$ lscpu -p
# The following is the parsable format, which can be fed to other
# programs. Each different item in every column has an unique ID
# starting from zero.
# CPU,Core,Socket,Node,,L1d,L1i,L2,L3
0,0,0,0,,0,0,0,0
1,0,0,0,,0,0,0,0

I'm using taskset 1 dd if=/dev/zero of=/dev/null to make the CPU a spike on vCPU 0.

According to AWS doc, the underlying hardware is using: 2.3 GHz Intel Xeon® E5-2686 v4 (Broadwell) processors or 2.4 GHz Intel Xeon® E5-2676 v3 (Haswell) processors

which both have 2 logical cores per physical.

In addition to M3.large, I noticed that M4.xlarge also has this problem, in which type includes 2 physical cores and therefore 4 vCPU. I tested it on M4.xlarge with taskset command as well. when there were spike on the vCPU either on the same core or the different one caused the stolen CPU.

For instance, vCPU0 and vCPU2 coexist in Core0. When I produce the spike on them, I can magically see stolen CPU on vCPU1.

It was quite wired. Please help figure out the reason. Thanks.

Jepsenwan
  • 160
  • 3
  • 11

1 Answers1

1

Typically CPU-steal happens on any virtualised infrastructure as the needs of the VMs change. CPU-steal is the stealing of cycles from your VM by another VM - not by different cores within your VM.

Given that Amazon are using an (albeit heavily custom) build of Xen, CPU steal is to be expected to a certain degree, and < 1% is absolutely expected and should not really change the characteristics of instance's performance.

Also, Amazon explicitly state that EC2 instances are given ECUs - Elastic Compute Units - which translate into a certain amount of cycles when benchmarked against a particular CPU (from memory it used to be an older Xeon), so just because you are given a certain number of CPU cores doesn't mean that you are getting that number of physical cores.

Craig Watson
  • 9,370
  • 3
  • 30
  • 46
  • Thanks Craig. Do you mean that nearly all EC2 types have the same problem other than T2, M3. It's just the matter of how big the stolen CPU is. – Jepsenwan Jul 14 '17 at 15:02
  • Correct, the issue isn't just down to EC2 - any virtualised platform will exhibit the same behaviour under extreme load. – Craig Watson Jul 14 '17 at 15:02
  • If you stop then start your instance in the console (NOT restart) you're moved to different physical hardware, which might get rid of that noisy neighbor. Another instance could be better, or worse. I've read on that AWS each physical machine runs one instance family, so running large VMs the chances are others are running large busy VMs on the same hardware. m4.large is actually their smallest m4 series, so you might be lucky. – Tim Jul 14 '17 at 17:56
  • @Tim, please don't help perpetuate that myth. There are no noisy neigbors in EC2. You aren't sharing CPU. Also, it sounds like you are referring instance *type* (`m4.large`) or instance *class* (`m4`). The *family* in this case is "General Purpose," which includes both `m` and `t`. – Michael - sqlbot Jul 15 '17 at 00:33
  • @Michael-sqlbot it's difficult to disprove a negative. I've read cases where people claim to have experienced it, but I've never experienced it myself. What I've read suggests only an instance type and size is put on the same hardware - so one server may only have m4.large instances, another m4.xlarge, etc. I can't even remember where I read this so I wouldn't take it as fact - I read a massive amount from all kinds of sources when I did the pro architect cert. – Tim Jul 15 '17 at 01:39
  • 2
    @Tim those people who claim to have experienced noisy neighbors are misinformed. EC2 steals CPU when (1) you're provisioned on a more powerful machine than you're paying for, (2) you're beyond your allowed CPU on a `t1`/`t2` or (3) for hypervisor overhead. [*"Amazon EC2 provides each instance with a consistent and predictable amount of CPU capacity, regardless of its underlying hardware. Amazon EC2 dedicates some resources of the host computer, such as CPU, memory, and instance storage, to a particular instance."*](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) – Michael - sqlbot Jul 15 '17 at 19:26
  • Hi Michael, do you mean the underlying hypervisor might steal the CPU cycle from the EC2 instance running on it by "3) for hypervisor overhead"? Thanks. – Jepsenwan Jul 18 '17 at 06:01
  • @Jepsenwan that's exactly what "hypervisor overhead" is – Craig Watson Jul 18 '17 at 06:03