remove duplicate lines without sorting

5

1

Is it possible to remove duplicated rows in Notepad++, leaving only a single occurrence of a line?

If I have these lines:
1
5
3
9


1
4
3

I want it to be:
1
5
3
9


4

I want it to keep first duplicated line, and remove all others duplicated lines... without sorting.

Could anyone help me please?

Best regards

Taha Fahed

Posted 2017-02-28T02:56:48.003

Reputation: 51

1if you have excel, you can paste the data into excel and use the "remove duplicate" button in excel. – David Dai – 2017-02-28T03:45:29.123

Answers

4

The requirements are a Regex that:

  • Does not sort the lines (disqualifies TextFX).
  • Keeps the first occurrence and removes the later duplicates.

I'm also having this problem. So far I've got this: ^(.*?)$\s+?^(?=.*^\1$)

  • It only works in notepad++ if you enable the "." matches newline option.
  • It removes the first occurrence and keeps the later duplicates.

I use to have a great (but very slow) regex for this that was javascript, notepad++, and VisualStudio find-and-replace compatible, but I've lost it. If I can figure it out or find it again, I'll update this.

Derek Ziemba

Posted 2017-02-28T02:56:48.003

Reputation: 1 072

1This is some powerful regex-fu. For me, it only worked when I disabled the "." matches newline option, but it works perfectly. – pbarney – 2018-01-29T17:49:46.480

0

  • Ctrl+H
  • Find what: ^(.+)(\R)([\s\S]+?)\1\R?
  • Replace with: $1$2$3
  • check Wrap around
  • check Regular expression
  • UNCHECK . matches newline
  • Replace all

Explanation:

^               # begining of line
  (             # start group 1
    .+          # 1 or more any character but newline
  )             # end group 1
  (\R)          # group 2, any kind of linebreak
  (             # start group 3
    [\s\S]+?    # 1 or more any character, not greedy
  )             # end group 3
  \1            # same content as group 1
  \R?           # optional linebreak, to take care of last line, may be without linebreak.

Replacement:

$1          # content of group 1
$2          # content of group 2
$3          # content of group 3

Result for given example:

1
5
3
9


4

NOTICE: You have to hit Replace all as many times as needed, it doesn't remove all the duplicates in one time.

Toto

Posted 2017-02-28T02:56:48.003

Reputation: 7 722

-1

This may be faster than some of the other answer(s):

  • Find:  (^.*$\r\n)\1*
  • Replace: $1
  • Select "Regular Expression" Radio button

Sridhar Sarugu

Posted 2017-02-28T02:56:48.003

Reputation: 1

Does this work for non-consecutive lines (as shown in the question); i.e., if the input is not sorted? – G-Man Says 'Reinstate Monica' – 2019-05-23T18:42:26.800