Is there a way to extract duplicate lines in Sublime Text?

8

3

I need to perform 2 operations in Sublime Text: extract unique lines and extract duplicate lines. For example for input

a
b
a

Extract duplicates should result in:

a

and Extract unique should result in:

b

Is there a built-in operation or a plugin to do that?

Poma

Posted 2016-10-22T06:51:19.700

Reputation: 1 267

Answers

12

You can find duplicate lines easily by running a Sort Lines then searching for this regex that uses line boundary markers ^ and $ and the back reference \1.

^(.+)$\n^\1$

Follow that with a Find All, Copy, Paste in a new tab, Permute Lines | Unique and you've extracted them.

twamley

Posted 2016-10-22T06:51:19.700

Reputation: 221

This is amazing: I added a small addition that helped me:

  1. Run the regex
  2. Replace all the matched values with: \t$1, which will indent all the matched values and keep only 1 instance of them in the file
  3. Run another regex: ^\d.*$ and replace the \d with your relevant tag
  4. This will keep only the duplicated values
  5. < – Oz Radiano – 2018-07-24T06:58:10.790

2

Unfortunately I don't have access to Sublime Text at the moment, so I'm not able to test this, but I believe something like the following might work for you:

  1. Sort the lines via the Edit -> Sort Lines command
  2. Install the Highlight Duplicates plugin, and use it to highlight all the duplicate lines
  3. Cut the highlighted lines to the Clipboard, and paste them into a New File
  4. The lines that remain in the original file are your Extract Unique lines
  5. In the New File, select all the text, and remove duplicate lines via the Edit -> Permute Lines -> Unique command
  6. The lines that remain in the New File are your Extract Duplicates lines

I'm not entirely sure that step #1 is actually necessary, but I included it just in case.

MJH

Posted 2016-10-22T06:51:19.700

Reputation: 1 028

I wondered the same thing and just tried it (Sublime 3.0, here)... sorting first is not necessary. (Unlike with Unix 'uniq'.) Nice. – Tom Hundt – 2018-02-25T18:30:08.227

0

Slightly modified @MJH answer above to get duplicated lines with Sublime 3 and DiffMerge, without using Highlight Duplicates plugin.

  1. Sort the lines via Sublime 3 Edit -> Sort Lines command
  2. Save original file as sorted_orig.txt
  3. Select all the text, and remove duplicate lines via Sublime 3 Edit -> Permute Lines -> Unique command
  4. Save modified file as no_dup_sorted.txt
  5. Start diff with DiffMerge tool with sorted_orig.txt and no_dup_sorted.txt files.
  6. Use Export -> File Diffs in DiffMerge to get a list of duplicates in clipboard or save to another file.

Alex M.

Posted 2016-10-22T06:51:19.700

Reputation: 1

0

Had the same problem (show me the dupes)... didn't find an easy Sublime-based answer and fell back to using Unix commands (my file had the data I wanted to find the duplicates of in columns 11-56):

cut -c 11-56 myfile.dat | sort | uniq -d

Posted here as an FYI to others.

Tom Hundt

Posted 2016-10-22T06:51:19.700

Reputation: 135

by the way I'va made a plugin that does that

– Poma – 2018-02-27T05:44:17.457