Most efficient way to print all lines with duplicated field?

> grep -F "`awk -F"|" '{print $2}' fruit_notes.txt | sort | uniq -d`" fruit_notes.txt banana|YELLOW|My turtle likes these cherry|RED|Sweet and tasty grapefruit|YELLOW|Very juicy lemon|YELLOW|Sour! apple|RED|Makes great pie

Answers

This changes the order of the output but it only requires reading the file once:

$ awk -F'|' '$2 in a{if(a[$2])print a[$2];a[$2]=""; print; next} {a[$2]=$0}' fruit_notes.txt
banana|YELLOW|My turtle likes these.
grapefruit|YELLOW|Very juicy
lemon|YELLOW|Sour!
cherry|RED|Sweet and tasty
apple|RED|Makes great pie

How it works:

$2 in a{if(a[$2])print a[$2];a[$2]=""; print; next}

If $2 is a key in associative array a, then (a) if a[$2] is non empty, print it, (b) Set a[$2] to empty, (c) print the current line, and (d) skip the rest of the commands and start over on the next line.
a[$2]=$0

If this is the first time we have encountered $2, save the current line in a under the key $2.

John1024

Posted 2019-06-12T18:01:44.290

Reputation: 13 893

1That's brilliant. I wasn't sure if it was possible to do it in one pass. Doing some reading now to understand the magic that makes it work! – Sagebrush Gardener – 2019-06-13T01:05:08.917