Differences between en_US.utf8 and en_US.UTF-8?

11

3

I've had a terrible time getting zsh to play nicely with Debian Jessie, and I've come to the conclusion that my issues all stem from my system's locale. Running locale, I see

LANG=en_US.utf8
LANGUAGE=
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

however /etc/default/locale contains LANG=en_US.UTF-8 as does /etc/environment, and my /etc/locale.gen file only has en_US.UTF-8 UTF-8 uncommented.

  1. Why does locale report something (subtly) different than seemingly every other option on my system, and
  2. How do I configure (fix) things to give "en_US.UTF-8" for every LC option when I run locale?

Connor Glosser

Posted 2015-11-11T16:05:41.287

Reputation: 315

Answers

13

The 'proper' name is UTF-8. However, Linux glibc will internally normalize the encoding name, by converting it to lowercase & removing most special characters, so both variants will work (as long as they don't escape to BSD systems).

Most of the time the .utf8 suffix in $LANG comes from GNOME; afaik, this has been fixed in 3.18.

But as said above, both utf8 and UTF-8 will work the same way on Linux glibc – the problem comes from elsewhere. Since you didn't write what the problem is, here's a general checklist:

  • Does locale -a (available locales) show either variant at all? I.e. have you generated (with locale-gen) the locales after editing locale.gen?

  • Does the terminal emulator's environment have the same locale settings? Use cat /proc/$(pidof xterm)/environ | tr \\0 \\n to check the environment of another process.

    (Frequently people try to set locale envvars from their ~/.bashrc or similar files, but environment variables do not propagate "upwards", so the end result.)

  • What does printf '\xe2\x99\xa5' output? If it shows one box or question mark, it means the font doesn't have the necessary character. If it shows three garbage characters, it means your terminal doesn't have the right $LANG (or just doesn't support UTF-8).

user1686

Posted 2015-11-11T16:05:41.287

Reputation: 283 655

Ahh, ok! Thanks! That illuminates things a little more for me. The problem that started this was an issue with zsh not properly rendering box-drawing characters; I just see a massive string of � replacement characters instead. Running locale -a only shows the lowercase variants, even after explicitly re-generating everything. The terminal emulator has the settings I would expect (i.e., en_US.UTF-8), and the output of printf shows me one little heart. – Connor Glosser – 2015-11-11T17:11:38.657

That rather sounds like the source of these box-drawing characters is not actually UTF-8-encoded (perhaps your ~/.zshrc was saved in cp437?) Test the terminal emulator directly using printf '┌┘' and printf '\xe2\x94\x8c\xe2\x94\x98\n', or perhaps cat a demo file.

– user1686 – 2015-11-11T17:17:32.633

I tried removing my ~/.zshrc and setting the appropriate options (prompt adam2 8bit) from the prompt directly so as to avoid any issues with the encoding of a settings file, but I still have the same issue. Which is particularly odd, because catting the demo file rendered perfectly---even the box-drawing characters at the end of the file! – Connor Glosser – 2015-11-11T17:30:59.840