Calculate Median based on counts of value

3

1

I have data of how many of a specific CPU we have, and their CPU-marks. Based on this I wish to dynamically calculate Median and Quartile stats.

For example, I have a table that looks like

Name                        Count   Marks
i7-5820k 3,3 GHz 6 cores    11      12,996
i7-950 3,1 GHz 4 cores      4       5595
i7-3820 3,6 GHz 4 cores     7       8998
i5-3570k 3,4 GHz 4 cores    1       7153

What i wish to do with the data, is to count the first row's marks 11 times, the second one 4 times and so on.

This is currently worked with in Google Spreadsheets, but I guess a similar function should be available in most spreadsheeting applications.

Gnutt

Posted 2017-08-14T07:57:45.893

Reputation: 183

Answers

4

Mathematics aside, you can use the following array formula :

=MEDIAN(IF(COLUMN(A:Z)<=B2:B5,C2:C5))

CtrlShiftEnter

Notes:

  • IF(COLUMN(A:Z)<=B2:B5,C2:C5) generates a 2D array of 4 rows, 26 columns where each row repeats the cell in column C as many times as specified in the cell of column B; the remaining entries in the row are FALSE.

  • The MEDIAN function ignores the FALSE cells and will evaluate the actual entries in the generated 2D array.

A.S.H

Posted 2017-08-14T07:57:45.893

Reputation: 430

1Clever solution! +1 – Excellll – 2017-08-14T19:25:52.380

1That IS pretty slick, @A.S.H – Bandersnatch – 2017-08-14T20:52:10.747

I assume this wouldn't work if any of the counts were > 26? – Danny – 2019-07-09T04:27:04.250

@Danny Replace A:Z by A:AZ to extend to 52. You get the idea. – A.S.H – 2019-07-10T04:49:02.073

Sure that can work but you are limited by the number of columns excel supports (which seems to be 16,384 from a quick Google) and it's certainly possible to have a frequency table with counts in excess of that. This is a very good solution for anything below that though! – Danny – 2019-07-11T05:04:00.893

0

Excel has a QUARTILE() function that you can use. But your dataset is small enough that you can get the answer almost by inspection.

First, move the row with the i7-950 CPUs to the bottom of the list. There are 23 CPUs total, so the median is the score of the 11.5th ranked CPU. Similarly, the 1st quartile is the score of the 5.75th ranked CPU.

In your case, these answers both come out to be 8998. That's a little strange, but it has to do with the distribution of your data. Both the first and second quartile fall in the same "bin". Imagine if everyone in the US weighed 150 lbs. Then the median weight and all the %-iles would be 150 lbs.

You COULD argue that your quartile and median values both fall between data points, and then interpolate between the data points. Then you would calculate the quartile as being 3/4 of the way between 5595 and 8998. And the median as halfway between 8998 and 12,996.

I get 8197.25 and 10,997.

To see what the QUARTILE() function thinks, create a column with 11 12,996's, 7 8998's, 1 7153, and 4 5595's.

Then use QUARTILE(range,1) and QUARTILE(range,2) to calculate the 1st quartile and the median. Here "range" is the list of 23 marks.

EDIT: interpolating isn't exactly kosher here because the CPU scores are all exact, and not an average of a range of scores. I THINK. I assume you looked up published values of the benchmark scores for each CPU.

I'm not sure what would happen if you actually ran the benchmark on each CPU. Would the same type of CPUs all produce an identical score, or would there be a range of scores? If there were some variation in the scores for identical CPUs (due perhaps to temperature or ??) then the interpolation would be justified, I think.

Comments are welcome here.

Bandersnatch

Posted 2017-08-14T07:57:45.893

Reputation: 3 430