Optimal CPI calculation confusion

0

I am reading https://insidehpc.com/2017/07/cycles-per-instruction-matters/:

For example, if a certain part of the code takes 1200 cycles and executes 600 instructions, then the CPI would be 1200/600 = 2. However, a core in this case should have a CPI equal to 0.5, this means that not enough work is being sent to the core, as only ¼ of the capacity is being used.

Either I have missed something on this link (I can't see it if I have) or the calculation of 0.5 hasn't been justified. Can someone clarify please?

Wad

Posted 2019-03-07T11:53:21.803

Reputation: 153

Why don't you leave a comment for the author asking him to clarify? – DavidPostill – 2019-03-07T12:00:11.143

I'm very, very sorry that I trust the people who use this site more than some random on the internet who could be writing complete nonsense. I came here as I have a great deal of respect for the users here because some of them are really, really smart. Moreover, my experience on other sites has been that authors simply don't respond to comments, let alone questions. I wanted an answer quickly, so came here. I guess in your book, whatever that is, that's a bad thing. How can a "moderator" have such a strange attitude towards someone coming here for help? I just don't get it. Sigh. – Wad – 2019-03-07T12:17:25.107

My comment was just a suggestion. If someone here can answer the comment does not stop that. – DavidPostill – 2019-03-07T12:19:17.137

I agree. However I am guessing it was you that downvoted; if not, then I do apologise. If it was, then maybe as a "moderator" you could also get a line of text added to the tooltip that appears over the downvote button stating "; or user has preferred to use a StackExchange site for clarification as opposed to using another resource". – Wad – 2019-03-07T12:23:29.313

Nice idea, but that probably isn't going to happen. In any case, meta would be the correct place to make such a suggestion. – DavidPostill – 2019-03-07T12:25:12.383

Answers

1

The actual article is unclear, but the clarifying phrases are both earlier and later.

In the sentence before your quote

For the per core case, all of the threads running on a hardware core must be aggregated to arrive at the proper ratio. 

And after

On an Intel Xeon Phi processor, there are 72 cores that can each have 4 threads running simultaneously

So for a CPI you need to know the Cycles (1200), number of instructions (600). Then if you want to know if you are using all of the CPU threads available and therefore theoretically fully burdening the core you need to know the number of thread per core (4).

The CPI figure is 2, but the author is then (without explaining it) dividing by the number of threads on each core. This gives you 2/4 = 0.5.

What the 0.5 figure is not CPI, but is some other figure that relates your (known) CPI to the number of threads. The actual value of this metric is only really useful when you have a good idea of ratio of how long your instructions take (their CPI) and how multi threaded your application is. If you are running the same thread doing a lot of calculations you can optimise the number of threads you run on a core or core set.

As a result you can tell that any number below 2 (for this particular code) is not using all the threads available and therefore the core could be given more work. Any number over 2 indicates that the core has more work than it can handle and task switching penalties will occur.

This is where I believe he is getting his 0.5 figure from.

The article is actually quite badly written but is trying to emphasise that you need to optimise your multi threading with knowledge of both how instructions execute, which is the CPI (2), and how many threads or cores you have available to effectively load up the resources appropriately.

Mokubai

Posted 2019-03-07T11:53:21.803

Reputation: 64 434

Thanks for taking the time to respond positively!. I'm sorry, I still don't get it. Author says CPI=2 is not good as we can add more work, thus we want CPI=0.5. But you say CPI < 1 means more work can be added, and CPI > 1 means too busy? I also still don't get what calculation has been done to get 0.5 (again, sorry!). Can you please clarify? – Wad – 2019-03-07T12:51:08.123

@Wad the calculation is to divide the CPI by the number of Core threads, so 2/4 which equals 0.5. That then effectively gives you the "thread saturation" of the core. I'll edit. – Mokubai – 2019-03-07T13:00:51.617

Confusing article. Thanks for doing your best to clarify what is evidently a bit of a wreck (?) – Wad – 2019-03-07T13:48:58.687