19

Is there a way that I manually have a user look up the current Codepage and locale of their windows OS? Is there a registry setting that stores that information?

It would also be useful if the technique worked all the way back to Windows 2000.

epotter
  • 707
  • 3
  • 6
  • 11

4 Answers4

20

chcp will get you the active code page.

systeminfo will display system locale and input locale, among other things.

"Note: This command (systeminfo) is not available in Windows 2000 but you can still query Windows 2000 computer by running this command on Windows XP or Windows 2003 computer and set remote computer to Windows 2000 computer. If the current user logon that execute this command already has privilege on remote machine (for instance, Domain Administrators), you don’t have to use /u and /p."
From here.

  • 1
    Be aware that `chcp` will get you the active **OEM** code page. As mklement states in his answer, there is always another active code page in use by Windows, the ANSI code page. For more information see [mklement's answer](https://serverfault.com/a/836221/508381). – kangalioo Feb 04 '19 at 12:47
20

Note that a given system has two active code pages of interest, as determined by the legacy setting named language for non-Unicode programs, formerly known as system locale (see the bottom section for background information):

  • the OEM code page for use by legacy console applications,
  • the ANSI code page for use by legacy GUI applications.

Note: There are two more code pages, but they are rarely used anymore, and therefore not discussed here: the EBCDIC code and the (pre-OS X) Mac code page - see the WinAPI docs.

The active OEM code page is most easily obtained via chcp, as shown in Forgotten Semicolon's helpful answer - assuming the console window wasn't configured with a custom code page via the registry and that the code page wasn't explicitly changed in the session with chcp <codePageNum>.

Determining the active ANSI code page is not as simple, but PowerShell can help, also with determining the name and language of the system locale:

In Windows 8+ / Windows Server 2012+: Use the Get-WinSystemLocale cmdlet:

Get-WinSystemLocale | Select-Object Name, DisplayName, 
                        @{ n='OEMCP'; e={ $_.TextInfo.OemCodePage } }, 
                        @{ n='ACP';   e={ $_.TextInfo.AnsiCodePage } }

Caveat: The information returned does not reflect a potential UTF-8 override that may be in place via a new Windows 10 feature (see this SO answer); instead, the information always reflects the code pages originally associated with the active system locale. If you do need to know whether the UTF-8 override is in effect, see the registry-based method below.

On a US-English system, the above yields:

Name  DisplayName             OEMCP  ACP
----  -----------             -----  ---
en-US English (United States)   437 1252

OEMCP is the OEM code page, ACP the ANSI code page.

A registry-based method that also works on older systems down to Windows XP:

# Get the code pages:
Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage | 
     Select-Object OEMCP, ACP

On a US-English system, the above yields:

OEMCP ACP 
----- --- 
437   1252

If you also want get the system locale's [friendly] name and LCID (though note that LCIDs are deprecated):

[Globalization.CultureInfo]::GetCultureInfo([int] ('0x' + (
        Get-ItemProperty 'HKLM:\SYSTEM\CurrentControlSet\Control\Nls\Language' Default
      ).Default)
)

On a US-English system, the above yields:

LCID             Name             DisplayName                                                                                                                                      
----             ----             -----------                                                                                                                                      
1033             en-US            English (United States)                                                                                                                          

Background information:

System locale is the legacy name for what is now more descriptively called language for non-Unicode programs (see NLS terminology), and, as the names suggest:

  • The setting applies only to legacy programs (programs that don't support Unicode).

  • It applies system-wide, irrespective of a given user's locale settings, and administrative privileges are required to change it.

It is important to note that is is a legacy setting, because code pages no longer apply to programs that use Unicode internally and call the Unicode versions of the Windows API.

Notably, it determines the active code pages, i.e., the character encoding used by default:

  • the ANSI code page to use when non-Unicode programs call the non-Unicode (ANSI) versions of the Windows API, notably the ANSI version of the TextOut function for translating strings to and from Unicode, which notably determines how the program's strings render in the GUI.

  • the OEM code page to make active by default in console windows, as reflected by chcp.

    • A console window's active code page determines how keyboard input and output from console applications is interpreted and displayed.
      • Note that that means that even output from Unicode console applications is translated to the active code page, which can result in loss of information; use of pseudo code page 65001, which represents the UTF-8 encoding of Unicode, is a solution, but that can cause legacy command-line programs to misinterpret data and even to fail - see this StackOverflow answer for details.
    • Unlike the ANSI code page, you can change the active [OEM] code page on demand for a given console window; e.g., to switch to OEM code page 850, run chcp 850 in cmd.exe, and $OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = [text.encoding]::GetEncoding(850) in PowerShell.
  • additionally, the rarely used anymore EBCDIC and Mac code pages.

Despite the word locale used in the legacy term and the word language in the current term:

  • The only aspects controlled by the setting are the set of active code pages and the default bitmap fonts, not also other elements of a locale (which are controlled by the user-level locale settings).

  • A given code page is typically shared by many locales and covers multiple languages; e.g., the widely used 1252 code page is used by many Western European languages, including English.

However, when you do change the setting via the Control Panel, you do pick the setting by way of a specific locale.

For a list of all Windows code pages, see https://docs.microsoft.com/en-us/windows/desktop/Intl/code-page-identifiers

mklement
  • 514
  • 4
  • 11
  • `GetACP()` function - https://technet.microsoft.com/en-us/dd318070 - that is interesting link, the remark section outright tells this function return value does NOT represent user's selected default input language and GUI language but something entirely different... – Arioch 'The Apr 05 '19 at 17:31
  • Indeed, @Arioch'The - that is what I tried to clarify in the background-information section: the system locale (a) determines the code pages (but no other locale settings) _system-wide_, (b) _irrespective_ of a given user's locale. Note how the linked page states (emphasis added): "Returns the current Windows ANSI code page (ACP) identifier _for the operating system_". As for the potential AppLocale 3rd-party replacement: I've added a link to the answer. – mklement Apr 05 '19 at 17:36
  • Actually, there are per-process (implicitly per-user) functions: `GetConsoleCP` and `GetConsoleOutputCP` https://msdn.microsoft.com/en-us/windows/desktop/ms683162 - but they have their own can of worms: first they probably are bound to OEM cps not ANSI CPs even half of the Windows CLI utils forgets this (another half remembers though :-/ guess which is which... ) and the fact they are two :-D // That really is a mess working in command line for localized Windows editions... – Arioch 'The Apr 05 '19 at 17:42
  • 1
    That GetACP remark/link is I think important as a "word of god" confirmation that MBCS-to-Unicode default conversion is *intended* to be user-independent and OS-global, not just implementation detail in some of Windows versions. – Arioch 'The Apr 05 '19 at 17:44
  • @Arioch'The. Good point about the ability to change the (OEM) code page for a given console window (shell process) - in the simplest case you can call `chcp ` from `cmd.exe` - I've updated the answer. You're not technically restricted to using an OEM code page, but it makes the most sense, because legacy command-line utilities used OEM code pages. Yes, the whole code-page business is an unholy mess that hopefully will go away soon, once all CLI tools are Unicode-aware and communicate via UTF-8 (I've also added some info on that to the answer), as has been in place on Unix for a long time. – mklement Apr 05 '19 at 19:16
  • That was because TCP-related utilities of Windows were... inconsistent. I tried chcp into OEM, into ANSI, into UTF-8 - at every setting some utilities started giving meaningful output, but others ceased. I even tried to force them into English by chcp, again some reacted some not. There was no common behavior... – Arioch 'The Apr 06 '19 at 11:16
  • Small upd. For few days I was looking for `GetMACCP()` function to accompany `GetACP()` and `GetOEMCP()` - and could find no traces of. Seems it was false memory and such a function never existed. However there exist special constants, "virtual" codepages, like `CP_ACP=0; CP_OEMCP=2; CP_MACCP=3;` - and no similar constant for EBCDIC. There is also `LOCALE_IDEFAULTMACCODEPAGE` together with `LOCALE_IDEFAULT_ANSICODEPAGE` and `LOCALE_IDEFAULT_CODEPAGE` (this last is for OEM codepage despite way too vague and generic name), but again no EBCDIC peer there. Probably just the historic artifact. – Arioch 'The Apr 08 '19 at 10:37
  • 1
    Probably today both pre-UNIX MAC and EBCDIC equally belong in "only of some historic importance" niche. I however is somewhat attached to that MAC CP, cause they managed to make yet another variant of marking new lines in plain text files, different from both UNIX and DOS-Win-OS/2 trees. It was exotic corner case I memorized. – Arioch 'The Apr 08 '19 at 10:40
  • Thanks, @Arioch'The. Re EBCDIC: There _is_ a `LOCALE_IDEFAULTEBCDICCODEPAGE` locale-info lookup constant - see https://docs.microsoft.com/en-us/windows/desktop/Intl/locale-information-constants. – mklement Apr 08 '19 at 10:45
  • 1
    Thanks. More topical link - https://docs.microsoft.com/en-us/windows/desktop/Intl/locale-idefault-constants - and EBCDIC is marked "Windows 2000" - so before w2k it probably did not exist, and for all the years since then no one bothered to update the headers conversion sources that I used :-D – Arioch 'The Apr 08 '19 at 11:11
2

The locale can also be seen in msinfo32.

epotter
  • 707
  • 3
  • 6
  • 11
0

The Windows API that returns the active code page is GetConsoleOutputCP().

slowhand
  • 101
  • 1