Cygwin locale for Russian/Russia

1

On my Windows box, locale command outputs the following:

LANG=ru_RU
LC_CTYPE="ru_RU"
LC_NUMERIC="ru_RU"
LC_TIME="ru_RU"
LC_COLLATE="ru_RU"
LC_MONETARY="ru_RU"
LC_MESSAGES="ru_RU"
LC_ALL=

This is perfectly fine, except that "no charset" in the locale output means "ISO charset", which is ISO-8859-5 for Russian/Russia and has never been used (historically, DOS used CP866, Windows used CP1251 ANSI codepade, and various Unices sticked to KOI8-R before the rise of Unicode era).

The above is consistent with locale charmap output, which is again ISO-8859-5.

Short C example also confirms ISO-8859-5 is used:

#include <stdio.h>

#include <locale.h>
#include <langinfo.h>

int main() {
    const char *locale = setlocale(LC_ALL, "");
    const char *codeset = nl_langinfo(CODESET);
    printf("locale: %s\n", locale);
    printf("codeset: %s\n", codeset);

    return 0;
}

outputs

locale: ru_RU/ru_RU/ru_RU/ru_RU/ru_RU/C
codeset: ISO-8859-5

Cygwin docs state that

Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory.

which is plain wrong (Windows ANSI codepage is CP1251!). Surprisingly, for Belarusian (Eastern Slavic language very close to Russian) be_BY locale the default charset is indeed CP1251 which is in accordance with both the documentation and common sense.

Is this a bug in Cygwin, or am I missing something here?

Bass

Posted 2015-11-03T10:51:41.237

Reputation: 542

1

bugs should be reported on "cygwin (at) cygwin (dot) com" see [link] (https://cygwin.com/cygwin/lists.html) . As interim you can set on your ".profile" the line "export LANG=$(locale -uU)" with your preference . I suggest "export LANG=ru_RU.UTF-8"

– matzeri – 2015-11-29T08:31:51.693

@matzeri: $(locale -u) returns en_GB for my English Windows 8.1 box, despite all regional settings are set to Russian/Russia. Requesting Unicode locale (-U) makes no sense, since this affects text file handling in a Cygwin shell launched from cmd.exe (TERM=cygwin). While mintty seems to ignore any locale settings (cat'ting a text file with ANSI Cyrillic always displays text correctly), plain bash launched from cmd.exe with raster fonts needs exactly ru_RU.CP1251. – Bass – 2015-11-30T10:19:00.800

have you tried to use "export LANG=ru_RU.UTF-8" or "export LANG=ru_RU.CP1251" ? – matzeri – 2015-12-02T06:52:59.083

@matzeri: Yes, both LANG and LC_ALL are set to ru_RU.CP1251, and still $(locale -u) returns en_GB. – Bass – 2015-12-02T09:22:04.193

@matzeri: Additionally, in strace locale -u output, I see multiple __get_lcid_from_locale: LCID=0x0419 lines. 0x0419 corresponds to Russian/Russia.

– Bass – 2015-12-02T09:30:47.570

No answers