Linux command to remove the duplicate lines but keep the first occurrence

2

I have a text file. Each line contains a string. Some strings are repeated. I want to remove repetition but I want to keep the first occurrence. For example:

line1
line1
line2
line3
line4
line3
line5

Should be

line1
line2
line3
line4
line5

I tried: sort file1 | uniq -u > file2 but this did not help. It removed all repeated strings while I want the first occurrence to be present. I do not need to sort. Just remove the exact repetition of a string in a new line while keeping everything else as it is.

user9371654

Posted 2018-06-05T09:53:26.210

Reputation: 647

Answers

4

If you allow sorting anyway, this will work:

sort | uniq

-u was the source of your trouble, because (from man 1 uniq):

-u, --unique
only print unique lines

while by default:

With no options, matching lines are merged to the first occurrence.

Kamil Maciorowski

Posted 2018-06-05T09:53:26.210

Reputation: 38 429

1

If you want to dedup while keeping first occurrence, you can do

awk '!visited[$0]++' "$your_hist_file" > "$your_new_hist_file"

If you want to dedup while keeping last occurrence, you can do

tac "$your_hist_file" | awk '!visited[$0]++' | tac > "$your_new_hist_file"

You can use one awk command and no tac to achieve this too, but it's as straightforward as using two tacs.

ssppjj

Posted 2018-06-05T09:53:26.210

Reputation: 21

0

Specialized utils that print unique lines without sorting:

  1. uq.

  2. unique.

See also: How to get only the unique results without having to sort data? and Unix: removing duplicate lines without sorting.

agc

Posted 2018-06-05T09:53:26.210

Reputation: 587