Find most frequently used value

0

I know that that question has been asked before but my question needs more scalable solution!

I have a column G with States, there are 10,000 States in that column.

I need to find out what is most frequently found state.

Usually the answer is: =INDEX($G:$G,MODE(MATCH($G:$G,$G:$G,0))) but in my case this is unacceptable, this would mean that Excel needs to make 10,000 lookups, accross 10,000 cells (100,000,000 operations) and that is just the best case( because excel is not known for finding out correct used range).

Since my column has a list of states I am thinking that I could modify this formula slightly =INDEX($G:$G,MODE(MATCH(state_list,$G:$G,0))) (where state_list is an array of state names, that would yield 500,000 operations which is a 200 fold improvement).

However when I try to calcuate this I get #N/A error.

I used F9 to check MATCH(state_list,$G:$G,0) result: {197,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A}

Correct me if I am wrong this does proove that I am using an array formula?

If so what am I doing wrong?

How can I get a realtively scalable way finding that most comon value(State) in column G:G

Thank You

sgp667

Posted 2014-10-21T15:11:10.017

Reputation: 563

You are trying to make Excel works like a Database. I suggest you upgrade to Access or a DB related software. Is a common mistake think that Excel has not any limitation. – MacGyver – 2014-10-21T15:15:06.117

I agree that Excel is not the scalable type of software, but my formula actaully does simpler job than the solution usually provided, I jsut don't know where is my mistake. And if it was my choice I already would be using Access, or MySQL. – sgp667 – 2014-10-21T15:27:12.957

Is a solution for you try to solve that in VBA/programmatically ? If yes, you will find more answers in stackoverflow. – MacGyver – 2014-10-21T15:34:24.977

1I prefer to use just formulas but i'll settle for VBA is no formula base solution emerges – sgp667 – 2014-10-21T15:37:39.593

Not sure I understand why you're referencing the entire column. =INDEX($G1:$G10000,MODE(MATCH($G1:$G10000,$G1:$G10000,0))) calculates near-instantaneously for me, and of course does not require CSE. – XOR LX – 2014-10-21T17:02:21.157

Answers

1

I haven’t actually tried this with 10,000 rows, but try

=INDEX($S$1:$S$50, MATCH(MAX(COUNTIF($G$1:$G$10000,$S$1:$S$50)), COUNTIF($G$1:$G$10000,$S$1:$S$50), 0))

where the unique state values are in S1:S50.  (Naturally this is an array formula; so you need to type Ctrl+Shift+Enter when you enter it.)

  • COUNTIF($G$1:$G$10000,$S$1:$S$50) counts how many times each of the states (S1:S50) appears in Column G, yielding a virtual array of 50 numbers that average 200 (because they add up to 10,000).  Note that this sub-expression appears twice, and, yes, in the worst case, it requires 500,000 (50×10,000) comparisons — but, on average, it requires half of that.
  • MAX(COUNTIF($G$1:$G$10000,$S$1:$S$50)) reports the largest number in this virtual array.  It will be a number ≥ 200 that represents the frequency (count) of the most frequently found state.
  • MATCH( max_value , array_of_counts, 0) finds the location in the virtual array of the maximum value.  This will be a number between 1 and 50, identifying the most frequently found state.
  • INDEX($S$1:$S$50, state_number ) gives you the name of the most frequently found state.

Scott

Posted 2014-10-21T15:11:10.017

Reputation: 17 653

2

Your formula is failing because you're referencing an entire column rather than a specific range. This will, naturally, find the most common value to be blank..

You can fix this by using this (extremely volatile) function -

=INDEX(INDIRECT("G1:G"&COUNTA(G:G)),MODE(MATCH(INDIRECT("G1:G"&COUNTA(G:G)),INDIRECT("G1:G"&COUNTA(G:G)),0)))

I don't recommend this.

With the number of items you're mentioning, I'd use a pivot table -

Select your column, insert - pivot table

Rows can be your states with values being their count. You can sort to find most common.

click for full size enter image description here

Raystafarian

Posted 2014-10-21T15:11:10.017

Reputation: 20 384