0
1
Granted, this is not a great title, but to be honest I was unsure how to word the question without posting an essay as the question. This description should add some flesh.
Problem:
I have a Python script (not written by me) that I run in Windows using Python 2.7. This is relatively basic and extracts information from various sources and PRINTs the output to the terminal. Some of this output uses characters in a non-ASCII character set, and this is where the fun began.
Whilst the Python script ran OK in the terminal printing to the screen, as soon as I added a file redirection, I received an error, and the Python script crashed. After a LOT of research, this seemed to boil down to the way Python 2.7 handles Unicode, and I worked around this by setting a Windows Environment Variable for Python. This was:
$env:PYTHONIOENCODING="UTF-8"
in PowerShell, and
Set PYTHONIOENCODING="UTF-8"
in CMD.
OK, so now the Python script output can be redirected to a file without crashing. Problem is, the two environments give different results. The basic format to run the Python script is:
python pythonscript.py parm1 > test.txt
Whilst this works in both CMD and PowerShell, I end up with a file with different encodings and characters. For example, a character causing issues is ø
. If I run the above line in CMD, the resultant file is encoded as UTF-8
and correctly shows this character. In PowerShell, running the same command results in a file encoded as UCS-2 LE BOM
(as shown in NotePad++), and the above character actually shows as 2 characters ├©
.
Even more bizarrely, if I don't redirect in either environment (so, just PRINT to the terminal), both show the incorrect characters.
I have also tried in PowerShell piping to the Out-file CmdLet, so:
python pythonscript.py parm1 | out-file -encoding UTF8 test.txt
This results in a file encoded as UTF-8-BOM
, but still the incorrect characters appears. I have tried different encoding types here, and although I ended up with different file encodings and different characters, nothing seems to be correct.
I have also looked at the code page of both environments by running chcp
. In both cases this returns Active code page: 850
. I have tried to set PowerShell to a code page of 65001
(which is utf-8
), and this has made no difference.
So, I'm thoroughly confused.
[Console]::OutputEncoding=[Text.Encoding]::UTF8
orstart python 'pythonscript.py parm1' -RedirectStandardOutput test.txt -Wait
. – user364455 – 2016-04-16T17:31:24.573Annoyingly, I think I missed an encoding option in my testing of the
out-file
CmdLet, which was 'OEM', So, if I I use the commandpython pythonscript.py parm1 | out-file -encoding OEM test.txt
, I actually get the correct UTF-8 encoding of the file and the correct characters. I find it slightly odd how this works yet setting the encoding toUTF8
doesn't, but there you go – Swinster – 2016-05-02T18:54:37.320