Suppress lines with awk

1

I have a multiline Bash variable: $WORDS containing one word on each line.
I have another multiline Bash variable: $LIST also containing one word on each line.

I want to purge $LIST from any word present into $WORDS.

I currently do that with a while read and grep but this is not sexy.

WORDS=$(echo -e 'cat\ntree\nearth\nred')
LIST=$(echo -e 'abcd\n1234\nred\nwater\npage\ncat')
while read -r LINE; do
    LIST=$(echo "$LIST" | grep -v "$LINE")
done <<< "$WORDS"
echo "$LIST"

I think I can do it with awk but did not managed to make it work.
Can someone explain me how to do it with awk?

Gregory MOUSSAT

Posted 2017-06-08T23:18:54.360

Reputation: 1 031

Answers

3

This should accomplish what you're trying to do.

WORDS=$(echo -e 'cat\ntree\nearth\nred')
LIST=$(echo -e 'abcd\n1234\nred\nwater\npage\ncat')

echo "$LIST" | awk -v WORDS="$WORDS" '
BEGIN {
  split(WORDS,w1,"\n")
  for (w in w1) { w2[w1[w]] = 1 }
}
{
  if (w2[$0] != 1) { print $0 }
}'

Here's how it works. First I'm using the -v option on the awk command line to pass the list of words as a variable. This variable will be visible inside the awk program with the name WORDS.

The BEGIN block gets executed before any input is processed. It contains two lines

split(WORDS,w1,"\n")

This split command takes the WORDS list and turns it into an array called w1.

for (w in w1) { w2[w1[w]] = 1 }

This for loop walks through the w1 array and generates an associative array called w2. Converting the array to an associative array will improve performance.

Next we have the main body of the loop that processes the LIST.

if (w2[$0] != 1) { print $0 }

This will check each line of input against our associative array and only print the line if the word was not found. Since we assigned each key to be 1 in our BEGIN block, we need only check to see if the value of that key equals 1 to know if it is defined.

virtex

Posted 2017-06-08T23:18:54.360

Reputation: 1 129

2

I suggest

echo "$LIST" | grep -vf <(echo "$WORDS")

Michael Vehrs

Posted 2017-06-08T23:18:54.360

Reputation: 255

Be careful with this as the grep will also match substrings. For example, if "cat" is in the $WORDS list, it will filter out not just cat, but also category, cattle, vacate, etc. If you can add ^ and $ to each word it should work. Try this: echo "$LIST" | grep -vf <(echo "$WORDS" | sed -re 's/(.*)/^\1$/') – virtex – 2017-06-09T13:14:25.143

1Or grep -x, if that is a problem. – Michael Vehrs – 2017-06-09T13:27:40.367

Very nice answer, thanks. The question was about awk, so I selected the corresponding answer, but I used your anwser. – Gregory MOUSSAT – 2017-06-10T00:37:01.080