What if file 2 has characters after each of those symbols? I want to do the same but keep the trailing characters.
OK, make a copy of file2
that has only the field that you want to filter on.
And, if the current file2
has the “non-unique symbol” immediately followed
by the “trailing characters” (e.g., efr-42
, rte-17
, etc.),
make another copy of file2
where they are space-separated.
Here are example commands based on the example data you provided:
sed 's/\(...\).*/\1/' file2.sorted > file2.symbol_only
sed 's/\(...\)\(.*\)/\1 \2/' file2.sorted > file2.separated
or
sed 's/\([^-]*\)-.*/\1/' file2.sorted > file2.symbol_only
sed 's/\([^-]*\)\(-.*\)/\1 \2/' file2.sorted > file2.separated
… based on the new data that you added to your question.
Then use comm
as before:
comm -13 file1.sorted file2.symbol_only > file2.no_match
… and join the symbols up with the trailing characters:
join file2.no_match file2.separated
If necessary, use another sed
to remove the spaces you added.
It occurs to me that you could build on this trick to get the output file back into file2
’s original order.
- Produce a copy of the original
file2
with line numbers.
- Shuffle the line numbers to the right of the symbols.
- (the above, starting with the
sort
commands)
- Sort the output on the original line number.
- Strip out the line numbers.
Let me know if you need help with this.
thanks Scott. What if file 2 has characters after each of those symbols? I want to do the same but keep the trailing characters. – barrrista – 2012-11-20T19:59:25.597