Using Mac Excel, how can I delete partial duplicate rows?

2

I'm running Mac Office 2011 under Lion.

I have a sheet of many thousands of rows of data, with a good 40 or 50 columns. One column is email address. If any two rows have a duplicate email address, I'd like to keep only one of the rows.

Ideally, when duplicates were found it would default to keeping the row that has a particular value ("Y") in a specific cell, if applicable. But even if it just kept the first or last one, it would be an improvement.

Also, there can easily be more than two instances of a particular email address, so a solution would need to be able to handle any arbitrary number of duplicates.

I have tried the solution suggested here - Remove duplicate rows in excel when not all the columns are the same - but it did nothing for me.

Thanks for any suggestions!

JVC

Posted 2012-09-19T05:55:01.977

Reputation: 309

did the answer below work for you? – JoshP – 2012-09-28T13:51:17.690

For me, this helped me solve the question: In Excel, How can I hide all but 1 row of a set of rows when a column value repeats multiple times? See my comments in the answer by @kurp. – qxotk – 2013-12-09T22:14:24.027

Answers

2

I don't know Mac Excel specifically, but I believe the following approach works for almost all spreadsheet applications:

  1. Go to an empty column.
  2. Use the =COUNTIF(A:A,A2) function. A:A is the column containing e-mails and A2 is the nearest cell containing the e-mail.
  3. Copy mentioned formula to the entire column.
  4. "1" is returned for unique values, anything more than "1" means duplicated value.
  5. Sort by e-mail, if you want to have duplicates in rows in neighboring cells.
  6. Use filter to hide non-duplicated values (i.e. hide 1s) or sort by new column descending to get all duplicates on top of the table.
  7. Additionally, you may use filter on other column (containing "Y" or sth) to separate e-mails you want to leave and ones to be removed.

kurp

Posted 2012-09-19T05:55:01.977

Reputation: 942

Aha... clever. That makes sense... I'll give it a shot, thanks!! – JVC – 2012-09-19T23:55:10.873

Building on top of @kurp's solution using COUNTIF(), I used an IF(COUNTIF(A:A,A2)>1,"Hide","Show"). It enables one to filter on that column to select only rows with "Show" as the value. When there are 5 rows with same value, the COUNTIF() evaluates to 5 for the first appearance, on the next row the COUNTIF() shows 4, etc. on down to the last row that shows 1. Note this requires the rows are sorted by the repeated value column. One can put the constant "Show" or "Hide" literally in any row to force that row to show or hide itself. – qxotk – 2013-12-09T22:09:17.783