Excel function to find and count all duplicates by row, excluding the first instance

1

I'm trying to create a report in excel that counts the number of duplicate rows, excluding the first instance of that row, in an imported sheet. I know that I could use "remove duplicates" in the Data menu for this, and I've also checked this solution: Remove duplicate rows in Excel and count results

But my goal is to be able to fully view what the duplicates are and where they are in the spreadsheet. I also want the report to count the number of original entries and the difference with the duplicates removed, like so:

X1 aaaaaaa
X2 bbbbbbb
X1 aaaaaaa
X3 ccccccc
X4 ddddddd
X1 aaaaaaa

Total Entries: 6
Duplicate Rows: 2
Total Unique Entries: 4

user1022144

Posted 2019-04-15T21:27:06.520

Reputation:

Will you only every have one row that has duplicates? – cybernetic.nomad – 2019-04-15T21:53:02.693

I want to count duplicate rows, not duplicates within the row in different columns. Like in my example, rows 3 and 6 are duplicates, because both "X1" and "aaaaaa". I'm not trying to count just the instances of "X1" and "aaaaaa" separately. – None – 2019-04-15T21:55:58.093

That does not answer my question. Let me reword: Could you have a seventh row with X4 ddddddd, resulting in 7 total entries, 3 duplicate roes and 4 unique entries? Are you willing to accept a solution with a helper column? – cybernetic.nomad – 2019-04-15T21:57:31.593

Helper column(s) seems like a must, as presented. – Alex M – 2019-04-15T21:58:49.313

Yes - that's exactly what I'm looking for. A helper column would be fine. The not-quite-there solution I'm currently using has one. – None – 2019-04-15T21:59:38.137

Answers

0

Assuming your data is in columns A and B, put the following formula in C1:

=IF(COUNTIFS(A$1:A1,A1,B$1:B1,B1)>1, COUNTIFS(A$1:A1,A1,B$1:B1,B1)-1,"")

And populate down.

This will sequentially number duplicates after the first one.

For the reporting cells use the following formulas:

Total Entries:

=COUNTA(A1:A7)

Duplicate Rows:

=MAX(C1:C7)

Total Unique Entries:

=COUNTA(A1:A7)-COUNTIF(C1:C7,">0")

enter image description here

cybernetic.nomad

Posted 2019-04-15T21:27:06.520

Reputation: 4 469

thank you! If I wanted to apply this to a large spreadsheet of indefinite size (for instance I'd want to take all of column A and column B without giving a numerical limit) how would I modify the COUNTIFS equation? – None – 2019-04-15T22:09:00.623

You just keep populating the COUNTIFS down until the last row. The reporting formulas can be changed to =COUNTA(A:A), =MAX(C:C) and =COUNTA(A:A)-MAX(C:C) respectively – cybernetic.nomad – 2019-04-15T22:15:13.040

this may be useful to populate a lot of rows with a formula – cybernetic.nomad – 2019-04-15T22:18:53.630

correct me if I am wrong here but are not the number of unique entries in your example 4? OP asked to ignore first instance. The number of duplicate entries could be taken as either 3 as [er your comment under the question? – Forward Ed – 2019-04-16T02:59:48.050

@Forward Ed - facepalm you're right I'm on a phone right now, will fix the issue when i get to areal computer – cybernetic.nomad – 2019-04-16T13:08:10.160

The edit is done – cybernetic.nomad – 2019-04-16T14:24:31.797

After looking at this more, I have a question about applying the MAX function here for duplicate rows in a larger project. For instance, let's say there's another set of rows that has 4 duplicates, along with the 2 duplicates listed above. Wouldn't MAX return 4 here, instead of 6? – None – 2019-04-16T15:58:40.710

Assuming you have 4 duplicates of a new value(let's say X5 eeeeeee) then the max function will return 4. If you have 4 duplicates of an existing value, it will return whatever the highest count is (ex: 6 for X1 aaaaaaa or 5 for X4 ddddddd) – cybernetic.nomad – 2019-04-16T16:03:30.653

Right, but let's say I wanted to count all of those duplicates - 6 for X1 aaaaaa and 5 for X4 dddddddd to get a final result of 11 total duplicates, how would I go about finding that sum? – None – 2019-04-16T17:50:50.527

=COUNTIF(C1:C7,">0") – cybernetic.nomad – 2019-04-16T17:55:52.470

If that replaces MAX, I don't think that would work, because that would include any instance of the result 1 - meaning the original entry wouldn't be excluded. – None – 2019-04-16T18:18:44.580

It will be because in column C the row of an original entry is blank and therefore >0 will be false and so that row won't be counted – cybernetic.nomad – 2019-04-16T18:31:01.283

Right, the original row is blank, but the second instance of X1 has a result of 1, and the third instance has a result of 2. When totaled, wouldn't that would return 3, which includes the original entry? Or am I misunderstanding COUNTIF here? – None – 2019-04-16T18:38:28.410

COUNTIF only counts the instances where the given condition (>0) is true. You are thinking of SUMIF, which would, in this case, sum the numbers greater than 0 (assuming we were to give it the same condition) – cybernetic.nomad – 2019-04-16T18:41:18.930

Ahhhhh, ok. Thank you! – None – 2019-04-16T18:55:25.777