How to get the count of the most frequent matching columns?

1

1

Need some help figuring out how to do this in Excel.

I have a long list of names split up into 2 columns - the first and the last name. I'd like to figure out what are the 5 most popular full names but - Some last names and first names are reversed, and so I'd like "John | Smith" to also count toward "Smith | John." How can I do this in excel?

I tried putting both columns into a pivot table, and putting First name and Last name into the row labels and the count of First Name into the values, but this gives me a long list of every name mixed with every other name, and the only way to figure out the most popular name is to scroll down through the entire table and count the highest number that I see. When I add a "top 10" value filter to the results, the top values are returned by the first name that has the highest frequency, regardless of what the last name is.

For example. I can have 5 names:

John Smith
Smith John
Frank White
Frank Green
Frank Black

And if I applied a top 10 filter, it would show me that Frank was the most common name, followed by John Smith, even though what I really want to see is John Smith, since it was in there more than any other name.

Hopefully this makes sense. If anyone could provide some insight into how I could accomplish this I would be most appreciative. Please let me know if you have any questions, thanks!

EDIT: Regarding how to determine what a first name is and what a last name is, The order of the names don't actually matter. The only problem I foresee with this is that there will be a duplicate of the most frequent names, so in the example I gave, "Smith John" and "John Smith" are the most popular names, with 2 each. (If that makes sense.) I'm ok with that, because it's very easy to look at a top 10 list and discount every other result.

Matt

Posted 2015-07-10T15:07:24.600

Reputation: 113

How is "John" with 2 occurrences more frequent than "Frank" with 3?? – user1016274 – 2015-07-10T15:14:01.133

@user1016274 the "john smith" and "smith john" = 2, while each "Frank" only has one occurrence. – Raystafarian – 2015-07-10T15:15:03.190

Do you already have a list of all the possible full names? – Raystafarian – 2015-07-10T15:20:10.070

@Rastafarian - Unfortunately I have 1.4 million rows spread across 100 books so it's not really an option to fix the data. I do not have a list of all possible full names. – Matt – 2015-07-10T15:24:06.980

@Raystafarian You can still solve the problem, but the result will have the caveat that distinct names could be grouped together, like Frank Joseph and Joseph Frank. – Excellll – 2015-07-10T15:28:58.743

John can be a last name. I think the worst that could happen is that there would be a duplicate top result - John Smith and Smith John. – Matt – 2015-07-10T15:29:28.587

@Raystafarian I'm not proposing a solution. I'm just saying you don't need to get hung up on the last-name-that-could-be-a-first-name issue. You can still get the result desired, but just take it with a grain of salt. That's all I'm saying. – Excellll – 2015-07-10T15:32:28.093

@Excellll fair enough – Raystafarian – 2015-07-10T15:33:24.523

1Basically the order of the name does not need to be differentiated. Sorry for the confusion, I'm using the names example but my actual situation is a little more complicated to explain and a little bit different. Basically Matt John and John Matt are exactly the same to me. – Matt – 2015-07-10T15:33:48.233

Answers

0

edit: After 3 edits of the question this might work better:

Assume columns A and B hold the name parts. Put this into column C:

=A1&" "&B1

Then enter this formula into a free column:

=(COUNTIF(C$1:C$7;$A1 & " " & $B1) + COUNTIF(C$1:C$7;$B1 & " " & $A1)) * NOT((COUNTIF(C$1:C1;$B1 & " " & $A1) > 0))

This counts name pairs regardless of order, and determines if the reversed pair has been seen before (note the running end index in the 3rd COUNTIF). If yes, the count will be set to zero. This way, only unique pairs will count.

Use a Top10 filter on this column to find the most frequent ones.

user1016274

Posted 2015-07-10T15:07:24.600

Reputation: 1 423

1This won't work - it's the reversal of names that's the issue. John Smith vs Smith John. – Raystafarian – 2015-07-10T15:13:23.840

Is it? I asked for more information on this. – user1016274 – 2015-07-10T15:15:00.053

Thanks for answering - I'm not sure what you mean by "how do I determine if it's reversed?" The name is reversed if the first name is in the last name column, and the last name is in the first name column. There may be spelling errors too but I'm not going to worry about counting "Jon Smith" toward "John Smith." – Matt – 2015-07-10T15:18:51.363

@Matt: and on: how do you determine that "John" is a first name? What about "Princess Lea" - is "Lea Princess" without doubt a lastname/firstname couple? – user1016274 – 2015-07-10T15:23:51.887

The firstname/lastname distinction is actually not important - see edited post or conversation with Raystafarian. – Matt – 2015-07-10T15:35:18.963

Thanks, this works great! I made a small adjustment to the last part: I added a header to the first row so that the list of names starts on row 2, then I added a second COUNTIF because when I sorted by largest to smallest, I was still getting duplicates of First | Last. * NOT(OR((COUNTIF(C$1:C1,$B3 & " " & $A3) > 0), (COUNTIF(C$1:C1,$A3 & " " & $B3) > 0))) – Matt – 2015-07-10T19:41:53.070

0

You can run this code to find all matches and make them look the same, whether it's first,last or last,first

Sub test()

Dim LastRow As Integer
LastRow = Cells(Rows.Count, "A").End(xlUp).Row

Dim fname As String
Dim lname As String

For i = 1 To LastRow
    fname = LCase(Cells(i, 1))
    lname = LCase(Cells(i, 2))
        For j = 1 To LastRow
            If LCase(Cells(j, 1)) = lname And LCase(Cells(j, 2)) = fname Then
                Cells(j, 1) = fname
                Cells(j, 2) = lname
            End If
        Next
Next

End Sub

Then use that data for your pivot table.

Raystafarian

Posted 2015-07-10T15:07:24.600

Reputation: 20 384