CPU or Disk bottleneck?

Say I have machine A and B, where machine B has a moderately faster disk but a comparable processor to machine A, everything else is the same. I perform a large Spark job locally on both machines where the input dataset is too large to fit into memory, forcing disk usage. As I run this large Spark job, I collect system metrics using sysstat/sar. The point of this is to compare the processors.

Machine B is able to finish the job roughly 10% faster. I see that machine B is able to achieve superior sector read/writes per second (30% more), with lower average I/O request response times (up to 250% better), by using sar. I jumped to the conclusion that machine B has an unfair advantage over machine A, because of it's faster disk.

My question is, how would I be able to determine if machine B's processor is just more effective at utilizing disk I/0 than machine A? More specifically, how can I make sure that the differences in disk speeds don't cause an unfair advantage, in order to make a fair comparison between the processors? Is there any system metrics that would give more information about this?

cbass

Posted 2017-06-27T14:27:38.743

Reputation: 1

1Set up a "LiveCD" style install and use a single temporary disk for any reading/writing in each machine? e.g. for the tests, remove the hard drives from each and use a single separate special hard drive for both tests. – Yorik – 2017-06-27T14:47:53.510

would swapping the HDD be out of the question and running the same processes? then you could see if machine A finishes faster than machine B? – TiO – 2017-06-27T15:00:08.603

JOC, what exactly are you trying to accomplish? if you are just trying to compare CPUs, there are other ways to do that, that don't introduce the disk as a variable factor. most benchmarking utilites would fit the bill better. – Frank Thomas – 2017-06-27T15:20:31.983

CPU or Disk bottleneck?

Answers