How do I remove similar instances of lines using Unix commands?

2

I have a file that contains lines that look like the following:

14|geauxtigers|90
14|geauxtigers|null

I want to remove all instances in the file with the null as the last term. Is there a way to do this with Unix commands?

I was going to read in the file with Java and look at adjacent lines and remove the line whose adjacent line has similar first two terms but null as the third term. Is there a way to do this through Unix tools?

Edit: I don't want to blindly remove all of the terms with null as the third term, I might have the following entry: 15|lsu|null I'd like to keep it since it is the only entry. It's just that, if there is another line with a third term that is non-null, I would like to keep the non-null value.

egidra

Posted 2011-10-24T20:01:28.957

Reputation: 209

So what does this have to do with Java? – Matt Ball – 2011-10-24T20:03:27.277

That looks like a job for the sed and grep commands, I'll let the experts answer this question – None – 2011-10-24T20:04:45.850

i am curious that, 5 answers, but nobody gave an AWK solution for such a typical "awk question" – None – 2011-10-24T20:25:37.313

This is underspecified. Are "similar" lines always adjacent? Do you want the result in the same order as the input? – ninjalj – 2011-10-24T21:36:43.850

Answers

1

I would like add one more answer, using awk:

awk -F'|' '{if($3!="null"){a=$1;b=$2;print}else{if(a!=$1 || b!=$2)print}}' yourFile

test

kent$  echo "14|geauxtigers|90
14|geauxtigers|null
foo|bar|blah
x|y|z
x|y|null"|awk -F'|' '{if($3!="null"){a=$1;b=$2;print}else{if(a!=$1 || b!=$2)print}}'    
14|geauxtigers|90
foo|bar|blah
x|y|z

Kent

Posted 2011-10-24T20:01:28.957

Reputation: 791

0

grep -v '|null$' yourfile.txt > filtered.txt

Marc B

Posted 2011-10-24T20:01:28.957

Reputation:

User needs to be selective about removing null lines. – glenn jackman – 2011-10-25T12:51:54.947

0

Assuming lines can come in any order, and the result is ordered numerically on first field, here's a Perl solution:

echo -e "2|asd|null
11|bla|asd
14|geauxtigers|90
2|asd|2
15|lsu|null
14|geauxtigers|null" | perl -e '
while(<>) {
  $line=$_;
  s@\|[^|]*$@@;
  $hash{$_}=$line
}
for $line (sort {$a<=>$b} keys %hash) {
  print $hash{$line}
}'

ninjalj

Posted 2011-10-24T20:01:28.957

Reputation: 511

0

Assuming the lines might appear in any order, scan the file twice, first finding the non-null lines: I assume the "key" is the first two columns:

awk -F '|' '
  NR == FNR  && $NF != "null" { notnull[$1 FS $2]; next }
  $NF == "null" && $1 FS $2 in notnull {next}
  {print} 
' filename filename > file.nonulls 

If the null line always follows it's partner:

awk -F '|' '
  $NF != null {seen[$1 FS $2]}
  $NF == "null" && $1 FS $2 in seen {next}
  {print}
' filename > file.nonulls 

glenn jackman

Posted 2011-10-24T20:01:28.957

Reputation: 18 546

-1

cat file | grep -v '|null$' > file2

This pipes the file named file (you can fill in another name after the cat) through the grep-command which filter lines with patterns. The '-v' inverses the match, so all lines are matched, that have not the pattern. At last the result is put into file2.

Mnementh

Posted 2011-10-24T20:01:28.957

Reputation: 856

User needs to be selective about removing null lines. – glenn jackman – 2011-10-25T12:52:26.740

That was added as an edit to the question. The old question was answered by this, as the similar answers show. I will try to improve my answer later, but I have not enough time now at work. – Mnementh – 2011-10-25T12:59:15.403

-1

grep -Ev 'null' > newfile.with.nulls.removed

Tim

Posted 2011-10-24T20:01:28.957

Reputation: 162

User needs to be selective about removing null lines. – glenn jackman – 2011-10-25T12:52:06.770

-1

Try using grep -v:

grep -v '|null$' myfile.txt > myfile-fixed.txt

maerics

Posted 2011-10-24T20:01:28.957

Reputation: 101

User needs to be selective about removing null lines. – glenn jackman – 2011-10-25T12:52:00.220

-1

Depending on your linux flavor, you can try something like:

egrep -v '[|]null$' < file.in > file.out

rsp

Posted 2011-10-24T20:01:28.957

Reputation: 101

User needs to be selective about removing null lines. – glenn jackman – 2011-10-25T12:52:21.107