What is the difference in Windows CMD and PowerShell redirection and text Encoding?

0

1

Granted, this is not a great title, but to be honest I was unsure how to word the question without posting an essay as the question. This description should add some flesh.

Problem:

I have a Python script (not written by me) that I run in Windows using Python 2.7. This is relatively basic and extracts information from various sources and PRINTs the output to the terminal. Some of this output uses characters in a non-ASCII character set, and this is where the fun began.

Whilst the Python script ran OK in the terminal printing to the screen, as soon as I added a file redirection, I received an error, and the Python script crashed. After a LOT of research, this seemed to boil down to the way Python 2.7 handles Unicode, and I worked around this by setting a Windows Environment Variable for Python. This was:

$env:PYTHONIOENCODING="UTF-8"

in PowerShell, and

Set PYTHONIOENCODING="UTF-8"

in CMD.

OK, so now the Python script output can be redirected to a file without crashing. Problem is, the two environments give different results. The basic format to run the Python script is:

python pythonscript.py parm1 > test.txt

Whilst this works in both CMD and PowerShell, I end up with a file with different encodings and characters. For example, a character causing issues is ø. If I run the above line in CMD, the resultant file is encoded as UTF-8 and correctly shows this character. In PowerShell, running the same command results in a file encoded as UCS-2 LE BOM (as shown in NotePad++), and the above character actually shows as 2 characters ├©.

Even more bizarrely, if I don't redirect in either environment (so, just PRINT to the terminal), both show the incorrect characters.

I have also tried in PowerShell piping to the Out-file CmdLet, so:

python pythonscript.py parm1 | out-file -encoding UTF8 test.txt

This results in a file encoded as UTF-8-BOM, but still the incorrect characters appears. I have tried different encoding types here, and although I ended up with different file encodings and different characters, nothing seems to be correct.

I have also looked at the code page of both environments by running chcp. In both cases this returns Active code page: 850. I have tried to set PowerShell to a code page of 65001 (which is utf-8), and this has made no difference.

So, I'm thoroughly confused.

Swinster

Posted 2016-04-16T12:08:58.563

Reputation: 33

[Console]::OutputEncoding=[Text.Encoding]::UTF8 or start python 'pythonscript.py parm1' -RedirectStandardOutput test.txt -Wait. – user364455 – 2016-04-16T17:31:24.573

Annoyingly, I think I missed an encoding option in my testing of the out-file CmdLet, which was 'OEM', So, if I I use the command python pythonscript.py parm1 | out-file -encoding OEM test.txt, I actually get the correct UTF-8 encoding of the file and the correct characters. I find it slightly odd how this works yet setting the encoding to UTF8doesn't, but there you go – Swinster – 2016-05-02T18:54:37.320

No answers