1
I have an input file with this line(user data/columns stripped out) and several thousand more. The xCE is an unconverted hex value from the clients file.
412640 xCE
When I run it thru this awk command:
awk -F'\t' '{if ($1 == "412640" ) print $1 "\t" $2}' TEST.txt > test1.txt
the output in test1.txt has converted xCE to Î, which is what I want to happen.
When I run the entire file with out the if, so this command:
awk -F'\t' '{print $1 "\t" $2}' TEST.txt > test2.txt
the output in test2.txt still has xCE in it, and when I tried:
awk -F'\t' '{if ($1 == $1 )print $1 "\t" $2}' TEST.txt > test2.txt
the output in test2 still has xCE in it.
Any advice on how to always get the converted output?
I'm using:: GNU Awk 3.1.7 My codepage is UTF-8 On redhat 6.7
EDIT: After a bunch more unit testing of both the 'good'/'bad' awk commands, I can't always replicate the 'bad' output. The larger the total rowcount, the less likely it is to convert the hexvalues, but it's not 100% of the time. I'm looking into trying to control the size of the buffer for awk now, on the assumption that it has to do with writing straight from buffer to the output vs writing to internal temp files when it needs the buffer for other things.
I ran your if ($1 == "412640" ) command for the line provided. It outputs nothing. Please add a link to a test file with some lines inside, the system on which you are running and the version of Awk. GNU Awk 4.0.1. – Hastur – 2015-10-13T17:39:10.373
Hauster, I'm guessing you have spaces instead of a tab between the two columns. Is there a way to upload files to superuser? – mike ray – 2015-10-13T17:41:42.810
2How is
print $1 "\t" $2
supposed to convertxCE
toÎ
? – Steven – 2015-10-13T17:44:16.337I've updated the question to include the awk/linux/codepage, and to explain the xCE is an uncoverted character from the client file. – mike ray – 2015-10-13T17:50:02.390
I tried uploading a sample thru google docs, but it kept on being 'helpful', and converting the bad character for me... – mike ray – 2015-10-13T20:05:58.520