Most efficient way to print all lines with duplicated field?

2

The following file, fruit_notes.txt, has three pipe-separated columns: fruit, color, and tasting notes. I would like to print all lines that have a duplicated color field. Order is not important.

banana|YELLOW|My turtle likes these.
cherry|RED|Sweet and tasty
grapefruit|YELLOW|Very juicy
grape|PURPLE|Yummy
lemon|YELLOW|Sour!
apple|RED|Makes great pie
orange|ORANGE|Oranges make me laugh.

This works...

> grep -F "`awk -F"|" '{print $2}' fruit_notes.txt | sort | uniq -d`" fruit_notes.txt
banana|YELLOW|My turtle likes these
cherry|RED|Sweet and tasty
grapefruit|YELLOW|Very juicy
lemon|YELLOW|Sour!
apple|RED|Makes great pie

However, it seems like an awkward (no pun intended) solution. It reads the file twice: once to find the duplicates in the color field, and again to find the lines matching the duplicate colors. It is also error-prone. For example, the following line would be incorrectly printed:

jalapeños|GREEN|My face turns RED when I eat these!

Is there a better way to do this, maybe using awk alone?

Sagebrush Gardener

Posted 2019-06-12T18:01:44.290

Reputation: 123

Answers

2

This changes the order of the output but it only requires reading the file once:

$ awk -F'|' '$2 in a{if(a[$2])print a[$2];a[$2]=""; print; next} {a[$2]=$0}' fruit_notes.txt
banana|YELLOW|My turtle likes these.
grapefruit|YELLOW|Very juicy
lemon|YELLOW|Sour!
cherry|RED|Sweet and tasty
apple|RED|Makes great pie

How it works:

  1. $2 in a{if(a[$2])print a[$2];a[$2]=""; print; next}

    If $2 is a key in associative array a, then (a) if a[$2] is non empty, print it, (b) Set a[$2] to empty, (c) print the current line, and (d) skip the rest of the commands and start over on the next line.

  2. a[$2]=$0

    If this is the first time we have encountered $2, save the current line in a under the key $2.

John1024

Posted 2019-06-12T18:01:44.290

Reputation: 13 893

1That's brilliant. I wasn't sure if it was possible to do it in one pass. Doing some reading now to understand the magic that makes it work! – Sagebrush Gardener – 2019-06-13T01:05:08.917