select data based on value of a field

0

I have a file with several entries providing IDs and another file with other entries subdivided in different fields separate by a TAB. I need to select the records of the second file based on matching value of the first file. I have seen on the web that AWK is the right tool (although probably GREP is simpler), but I do not get any output.

For this example, I used arrays rater than files, but in order to use awk I had to create a temporary file. In essence, I need to match the 3rd field of the second file (var2) with the value provided by the first file (var1). The selection form var2 should be: "shameText\t someWhat\t beta\t thatIs", from which I print only the first field, so the output should simply be: "shameText". I might have missed the right way to assign arrays, but anyway this example is just a proxy for the real match on files.

The question is: how do a select a row (record) or a single field based on a match between a filed and the value of a variable?

Example:

var1="alpha beta gamma delta epsilon"
var2="
'someText somethingElse zeta  someMore'
'sameText someElse  kappa andMore'
'shameText  someWhat  beta  thatIs'
'shortText  moreElse  theta andMore'"
echo $var2 > tempFile
for i in $var1
do
  printf "i is: %s\n" $i
  awk -F\t '$3 == "$i" {print $1}' tempFile
  echo "next item"
done
rm tempFile

Gigiux

Posted 2017-11-24T14:00:53.340

Reputation: 1

If I understood right, you want to search lines in the var2 which the third field is one of the word in var1 list, then print only the first field of those lines grep -f <(tr ' ' '\n' <file-with-IDs) file-with-fields | cut -f1 – Paulo – 2017-11-24T23:25:32.763

(I should have posted this comment before) Your code works, but there is some corrections. In echo $var2 > tempfile $var2 must be double-quoted to preserve tabs and new lines echo "$var2" > tempfile. In the awk line, -F option doesn't need to be set, default awk separators are blank and tab, and variable $i must be exposed to the shell awk '$3 == "'$i'" {print $1}' tempFile Note that there are double-quotes inside the awk command. – Paulo – 2017-11-24T23:48:33.850

Answers

0

More simple awk solution could be comparing $3 with a regex.

awk '$3 ~ /alpha|beta|gamma|delta|epsilon/ {print $1}' tempFile

Passing the list as $var1

awk '$3 ~ /'"${var1// /|}"'/ {print $1}' tempFile

If $var1 is a file, you could pass it to awk with cat

awk '$3 ~ /'"$(cat IDs|tr ' ' '|')"'/ {print $1}' tempFile

Paulo

Posted 2017-11-24T14:00:53.340

Reputation: 606

Thanks Paulo, I always get confused with the Bash's quotes. The second solution you give me works just fine. But after all, I think grep is easier. – Gigiux – 2017-11-26T15:25:22.343

for the record, the main problem with my real data was that I exported them from windows to linux. I had to do: r -d '\r' < file_dos > file_linux to have the grep command working. – Gigiux – 2017-11-26T16:35:58.220