Unicode and MS DOS sessions

0

Considering UTF-8 + Windows CMD nightmare...

After reading this question, are these solutions only partial ? Is there a way to set globally the character set/encoding in a cmd environment? It seems that CHCP command does not change the stdout/stderr encodings.

To check it: write a program that fills a file with latin/korean/ukrainian strings.

On direct output, the file will be ok if you set the encoding properly into your source code (i checked it with Java, easy encoding settings for files). But if you redirect your output into a log file, you will simply have series of ???????????????????? in it ...

The indirection could be useful too, like this:

PROMPT> myprog < inputdata.txt > outputdata.txt

Am i missing something? Is it cmd that badly converts stdout, or Java that adapts System.out, depending of the cmd encoding? I have not found any method to re-define System.out/err encoding.

Grubert

Posted 2015-06-17T14:27:24.560

Reputation: 11

Read http://ss64.com/nt/chcp.html and this detailed analysis in great answer by @andrewdotn to another question at SO. FYI, I have DejaVu Sans Mono font installed.

– JosefZ – 2015-06-17T15:58:01.480

To answer the question of whether it's cmd or the program.. try pasting the character into cmd, if it goes there then cmd is fine. i.e. the font supports it. I find type can display a file with funny characters if it's unicode LE(xxd -p file, look for fffe at the start, save file in notepad as 'unicode' that's unicode little endian), but more cannot display these funny characters. – barlop – 2015-06-17T23:49:26.407

I find that for redireciton . utf8 works in c sharp though unicode doesn't – barlop – 2015-06-20T18:01:45.160

Many thanks for your answers, finally got it: whatever the session settings are, you must redefine stdout and stderr. For Java, do something like: myStdOut = new PrintWriter( new OutputStreamWriter( System.out, "UTF8" )); see this post:https://poeticcode.wordpress.com/2009/01/19/systemout-and-utf8/ . Many thanks to this contributor. Not sure at this time what to do to deal with System.in.

– Grubert – 2015-06-22T08:57:52.063

@Grubert paste better you mean PrintWriter out = new PrintWriter(new OutputStreamWriter(System.out)); out.println(“some-utf8-string”); i'm not in front of java right now but you could experiment with InputStreamReader(System.in) and a readLine() You should ask on stackoverflow, it's a coding issue as you know – barlop – 2015-06-25T11:58:49.190

Answers

0

Considering UTF-8 + Windows CMD nightmare...

Works for C#.

Should work for Java too, maybe you are doing it wrongly. You should put your problem code on stackoverflow and ask where you are going wrong with the encoding statements.

To check it: write a program that fills a file with latin/korean/ukrainian strings.

I have done something like that in C#

On direct output,

you mean on display

the file will be ok if you set the encoding properly into your source code (i checked it with Java, easy encoding settings for files). But if you redirect your output into a log file, you will simply have series of ???????????????????? in it ...

You have to get the encoding statement correct in your code, then the > will work.

I haven't had to change CHCP in order to just redirect non ascii unicode characters to a file. Or to put it another way.

The indirection could be useful too, like this:

PROMPT> myprog < inputdata.txt > outputdata.txt Am i missing something? Is it cmd that badly converts stdout, or Java that adapts System.out, depending of the cmd encoding? I have not found any method to re-define System.out/err encoding.

It's all an issue with your Java code.

See it work here in C#

https://stackoverflow.com/questions/30904504/font-is-right-why-cant-i-get-this-unicode-character-to-display-in-this-c-sharp

And look at my comment on Htin's answer. But that's for C#

You want it for Java, post your a demonstrative piece of code with your question, to stackoverflow. It's a programming issue that you have.

barlop

Posted 2015-06-17T14:27:24.560

Reputation: 18 677