Linux command to remove the duplicate lines but keep the first occurrence

I have a text file. Each line contains a string. Some strings are repeated. I want to remove repetition but I want to keep the first occurrence. For example:

line1
line1
line2
line3
line4
line3
line5

Should be

line1
line2
line3
line4
line5

I tried: sort file1 | uniq -u > file2 but this did not help. It removed all repeated strings while I want the first occurrence to be present. I do not need to sort. Just remove the exact repetition of a string in a new line while keeping everything else as it is.

user9371654

Posted 2018-06-05T09:53:26.210

Reputation: 647

Answers

If you allow sorting anyway, this will work:

sort | uniq

-u was the source of your trouble, because (from man 1 uniq):

-u, --unique
only print unique lines

while by default:

With no options, matching lines are merged to the first occurrence.

Kamil Maciorowski

Posted 2018-06-05T09:53:26.210

Reputation: 38 429

If you want to dedup while keeping first occurrence, you can do

awk '!visited[$0]++' "$your_hist_file" > "$your_new_hist_file"

If you want to dedup while keeping last occurrence, you can do

tac "$your_hist_file" | awk '!visited[$0]++' | tac > "$your_new_hist_file"

You can use one awk command and no tac to achieve this too, but it's as straightforward as using two tacs.

ssppjj

Posted 2018-06-05T09:53:26.210

Reputation: 21

Specialized utils that print unique lines without sorting:

uq.
unique.

agc

Posted 2018-06-05T09:53:26.210

Reputation: 587

Asked: 2018-06-05T09:53:26.210

Viewed: 2 843 times

Active: 2019-09-09T20:59:27.117