One of the features relevant to the system affected by locale settings is the text encoding, or the "charset", or the "codepage" – taken from the LC_CTYPE parameter. Although in many situations the text encoding is given by specification (e.g. D-Bus protocol strings are always UTF-8), there are also many places where the encoding is unspecified and has to be taken from the current system locale.
In particular, filenames are frequently shown according to the current locale text encoding. Programs written in Python 3, for example, use the current locale encoding if the program forgets to specify otherwise.
The 'C' locale implies 7-bit ASCII text encoding (ANSI_X3.4-1968), and part of your problem may be that while many programs (those written in C, generally) interpret this to allow arbitrary 8-bit values, there are also many programs which have a much stricter interpretation and reject any values above 127 (i.e. non-ASCII) as invalid. It might be that a decoding error is caused by some file name, or some configuration parameter, or some other text file.
In fact, at this point you'll even find programs which outright refuse to work with a locale that specifies the ASCII text encoding – some of them requiring UTF-8 specifically (such as gnome-terminal), and some others requiring any 8-bit encoding.
If your distribution applies the "C.UTF-8" patch to libc, use it:
LANG=C.UTF-8
If not, then use one of the following:
LANG=en_US.UTF-8
LC_TIME=C
LC_COLLATE=C
LC_MESSAGES=C
LANG=C
LC_CTYPE=en_US.UTF-8
(You can run locale charmap
to see what codepage is in effect according to current environment variables – in both cases it should say UTF-8. If you choose the 3rd option, beware of buggy programs which directly look at $LANG instead of calling nl_langinfo(CODESET) as they ought to.)
Well,
C.utf-8
doesn't show up on CentOS 7, but it does on Ubuntu 17.04 usinglocale -a
. I'll take this to mean that CentOS doesn't have the patch in question. Either that, or the locale isn't installed. In any case, I'm liking option 2 (of 3). I'm not hell-bent on heading back to ASCII; I just want the sorting of characters, times, etc. to work the way I'm used to. Also, it seems like if a change to the locale is going to break an installation, it should fail a little more gracefully. But that's another topic. Thanks 10^6! – Erik Bennett – 2019-03-25T06:56:47.357