How to fix my locale to display Unicode correctly in irssi?

3

1

I've just arrived in my new lab and japan and the server I can use has only Japanese locales. A call to locale -a returns

C
POSIX
ja_JP
ja_JP.eucjp
ja_JP.ujis
ja_JP.utf8
japanese
japanese.euc

So I changed my environment variables and now my locale is set to ja_JP.utf8 which should support Unicode just fine. A call to locale now returns (changed from eucjp):

LANG=ja_JP.utf8
LANGUAGE=
LC_CTYPE="ja_JP.utf8"
LC_NUMERIC="ja_JP.utf8"
LC_TIME="ja_JP.utf8"
LC_COLLATE="ja_JP.utf8"
LC_MONETARY="ja_JP.utf8"
LC_MESSAGES="ja_JP.utf8"
LC_PAPER="ja_JP.utf8"
LC_NAME="ja_JP.utf8"
LC_ADDRESS="ja_JP.utf8"
LC_TELEPHONE="ja_JP.utf8"
LC_MEASUREMENT="ja_JP.utf8"
LC_IDENTIFICATION="ja_JP.utf8"
LC_ALL=

I can read file containing Japanese characters in Unicode just fine, whether I'm using less, emacs or vim and connecting from PuTTY or a remote xterm with cygwin. It also seem to display other Unicode characters fine.

But here comes the problem: if I type something in Japanese it seems to go wrong. I like to use IRC and for some reason, while I can read perfectly fine any Japanese character if I type something it's sent as garbage for other people. I'm using the configuration found here http://xkr47.outerspace.dyndns.org/howtos/irssi-utf-8-guide.txt

I'm getting these results for /set charset

term_charset = utf-8
recode_out_default_charset = ISO-8859-15

and /set recode

recode = ON
recode_autodetect_utf8 = ON
recode_fallback = ISO-8859-15
recode_out_default_charset = ISO-8859-15
recode_transliterate = ON

If you have suggestions, please try to think of a way which doesn't require root rights if possible since it would take forever for the administrator to actually do something on the server. I've looked up a lot online about locale but I didn't find anything about this problem.

meneldal

Posted 2015-04-23T03:21:19.290

Reputation: 203

Which IRC client are you using? If you run cat > testfile.txt, does it store the typed text correctly? – user1686 – 2015-04-23T07:23:38.987

I'm running irssi. I'm using the same config file as on my other server where everything works fine. I can't try cat now but I will do this tomorrow when I get back in the lab. – meneldal – 2015-04-23T11:24:34.483

With cat I'm getting a textfile I can read on both Linux and Windows. Notepad++ says it's encoded in UTF-8 without BOM. Native notepad also opens it fine. Documents I create with nano also use this encoding. – meneldal – 2015-04-24T01:29:29.360

So your terminal is working fine, but Irssi is interpreting things weirdly. Could you check what /exec locale outputs, as well as /set charset and /set recode? – user1686 – 2015-04-24T05:42:57.477

1I'm getting LANG=ja_JP.UTF-8 (and the other lines same, LC_ALL not set), term_charset = utf-8 and recode_out_default_charset = ISO-8859-15 and for the last one recode = ON recode_autodetect_utf8 = ON recode_fallback = ISO-8859-15 recode_out_default_charset = ISO-8859-15 recode_transliterate = ON – meneldal – 2015-04-24T05:48:13.517

By the way since I noticed I had LANG=ja_JP.UTF-8 instead of LANG=ja_JP.utf8 I quit that console again, logged out completely so now it's also LANG=ja_JP.utf8 but it's still not working. I see my own message correctly but people don't receive it right while the messages other people send me work fine (I'm actually testing by sending messages from my other server, including characters not present in ShiftJS or eucJP) – meneldal – 2015-04-24T05:56:59.320

It's most likely caused by recode_out_default_charset telling Irssi to convert everything to ISO-8859-15. Fix that setting. – user1686 – 2015-04-24T06:10:12.590

What should I put there instead? I'm using the recommended settings from the irssi FAQ so I assume it should be working. – meneldal – 2015-04-24T06:12:14.830

Uh, UTF-8, what else. – user1686 – 2015-04-24T06:12:43.487

Thank you that does fix the irssi problem. I'm pretty sure I pasted the settings from the same place though – meneldal – 2015-04-24T06:18:00.937

Answers

1

So as determined in comments, Irssi was configured to convert messages to ISO-8859-15 when sending them, instead of UTF-8.

Change the output charset using:

/set recode_out_default_charset UTF-8

Also, if you're in mixed-charset channels, /set recode_fallback Shift-JIS might be useful (it changes the received-message decoding). Irssi will always try UTF-8 first, but if the decoding fails, it'll use the recode_fallback next.

user1686

Posted 2015-04-23T03:21:19.290

Reputation: 283 655

Thank for for fixing irssi, but I still don't understand why the file names aren't working between Linux and Windows. – meneldal – 2015-04-24T06:19:25.757

It's a separate program and should probably be in a separate question... It also depends on what protocol both Windows and Linux use to access that shared drive. (NFS? SMB/CIFS? AFS?) – user1686 – 2015-04-24T06:21:05.607

I will edit my question accordingly to ask only about IRC and will make a new one for that then. Using Japanese Windows will make me spend some time finding out this information. – meneldal – 2015-04-24T06:28:54.047

It also turns out that I didn't read the website (in the link) correctly since it did say that it would use ISO-8859-15 by default for all windows. I just didn't think it came from here since it took me a while to fix everything to get any kind of display of Japanese characters in the console. – meneldal – 2015-04-24T06:36:54.423