removing duplicate lines from file with grep

Question

I want to remove all lines where the second column is 05408736032.

0009300|05408736032|89|01|001|0|0|0|1|NNNNNNYNNNNNNNNN|asdf|
0009367|05408736032|89|01|001|0|0|0|1|NNNNNNYNNNNNNNNN|adff|

Double posted: http://stackoverflow.com/questions/1439816/removing-duplicate-lines-from-file-grep — Dennis Williamson, Sep 17 '09 at 19:19

score 9 · Answer 1 · answered Sep 17 '09 at 16:22

9

awk -F \| '{if ($2 != 05408736032) print}'

answered Sep 17 '09 at 16:22

SergeyZh

194
2

1

You can leave out the "if" and the "print": `awk -F \| '$2 != "05408736032"'` – Dennis Williamson Sep 17 '09 at 18:24

score 3 · Accepted Answer · answered Sep 17 '09 at 16:30

This might do what you want:

sort -t '|' -k 2,2 -u  foo.dat

However this sorts the input according to your field, which you may not want. If you really only want to remove duplicates, your best option is Perl:

perl -ne '$a=(split "\\|")[1]; next if $h{$a}++; print;' foo.dat

score 1 · Answer 3 · answered Sep 17 '09 at 18:50

Pure Bash:

oldIFS=$IFS
while read line
do
    IFS=$'|'
    testline=($line)  # make an array split according to $IFS
    IFS=$oldIFS       # put it back as soon as you can or you'll be sooOOoorry
    if [[ ${testline[1]} != "05408736032" ]]
    then
        echo $line
    fi
done < datafile

Cian · Answer 4 · 2009-09-17T17:00:37.657

Is it that you want to remove all lines where the second | separated field contains '05408736032'? Will all the lines be formatted the same? If so, this should output the file minus those lines (it's perl that takes the original file as the first argument and the file it's going to as the second).

#!/usr/bin/perl
use warnings;
use strict;
my  ($file1, $file2) = @ARGV;
open my $origin_file, '<', $file1;
open my $newfile, '>', $file2;
while (my $line = <$origin_file>) {
    my @values = split '/|/', $line;
    print $newfile $line unless $vaules[1] = '05408736032';
}
close $newfile or die $!;
close $origin_file or die $!;

(I haven't tested this, so you probably want to backup the original file before you try it)

On reading again, you may be looking to grab only lines with a unique second column. This should do that.

#!/usr/bin/perl
use warnings;
use strict;
my  ($file1, $file2) = @ARGV;
open my $origin_file, '<', $file1;
open my $newfile, '>', $file2;
while (my $line = <$origin_file>) {
    my @values = split '/|/', $line;
    print $newfile $line unless defined $unique{$values[1]};
    $unique{$vaules[1]} += 1;
}
close $newfile or die $!;
close $origin_file or die $!;

score 0 · Answer 5 · answered Sep 17 '09 at 16:20

You can do something like:

for f in `cat $file`; do 
  val=`echo $f | cut -d\| -f 2`
  if [ `grep $val $file | wc -l` -lt 2 ]; then
     echo $f
  fi
done

but, like most shell scripts, it's pretty inefficient. You'd be better off doing it in perl, something like:

@infile=<>;

foreach (@infile) {

  @foo = split(/|/);
  if exists $found{$foo[1]} {
    $found{$foo[1]}++;
  } else {
    $found{$foo[1]}++;
  }

}

foreach (@infile) {
  @foo = split(/|/);
  if ($found{$foo[1]} < 2) {
    print $_;
  }
}

removing duplicate lines from file with grep

5 Answers5