Can the UTF-8 code page identifier (65001) be different on other computers?

Basically, Windows cmd (and it's batch script interpreter as well) relies on conformance of (current) active code page and batch script encoding. For instance, if you save a script from Notepad in so-called ANSI encoding (which strongly depends on Windows system locale), then you should run it under corresponding code page, see National Language Support (NLS) API Reference:

English (US) : ANSI corresponds to ACP 1252 (CP 437),
English (UK) : ANSI corresponds to ACP 1252 (CP 850),
Turkish : ANSI corresponds to ACP 1254 (CP 857),
Central Europe: ANSI corresponds to ACP 1250 (CP 852), etc.

Your presumption is right:

The simple solution to this that I would be to add chcp 65001 at the top of the file to change the active codepage to the UTF-8 one. … But this didn't work.

Unfortunately, neither Windows cmd nor batch interpreter cares about Byte Order Mark and treats it as a valid character - disregarding of currently active code page.
Hence, the first line (CHCP 65001 command in your case) of an UTF-8 encoded file is dirtied if the BOM is present; an attempt to run such dingy command would lead to error message ' CHCP' is not recognized as an internal or external command, operable program or batch file (errorlevel 9009).

Solution: save your script UTF-8 encoded without BOM.
Workaround if you can't do it (as Notepad always writes BOM): use a dummy command as the first line of your script, e.g. as follows:

@rem if this line is visibly executed then BOM is present >NUL 2>&1
@echo OFF
    rem save current code page to the `_chcp` variable
for /F "tokens=2 delims=:" %%G in ('chcp') do set "_chcp=%%G"
    rem change active code page to UTF-8 (silently)
CHCP 65001 >NUL
    rem echo this is UTF-8 encoded batch file %~nx0
echo(
subst t: "D:\bat\Unusual Names\Türkçe (Türkiye)\çğüşöıĞÜİŞÇÖ"
subst
dir /B /S t:\*.txt
subst t: /D
echo(
echo(  works as well for characters from Unicode Basic Multilingual Plane
subst t: "D:\bat\Unusual Names\CJK\中文(繁體)"
subst
dir /B /S t:\*.txt
subst t: /D
echo(
echo(  works even for characters from Unicode Supplementary Multilingual Plane
subst t: "D:\bat\Unusual Names\"
subst
dir /B /S t:\*.txt
subst t: /D
    rem set active code page back to previously saved value (verbose)
echo(
CHCP %_chcp%

Output:

==> utf8.bat

==> ´╗┐@rem if this line is visibly executed then BOM is present  1>NUL 2>&1

T:\: => D:\bat\Unusual Names\Türkçe (Türkiye)\çğüşöıĞÜİŞÇÖ
t:\ĞÜİŞÇÖçğüşöı.txt

  works as well for characters from Unicode Basic Multilingual Plane
T:\: => D:\bat\Unusual Names\CJK\中文(繁體)
t:\chinese traditional.txt

  works even for characters from Unicode Supplementary Multilingual Plane
T:\: => D:\bat\Unusual Names\
t:\Mathematical Bold Script.txt

Active code page: 852

Finally, you could remove the first line (containing BOM) from your script using more command as follows (note chcp 65001 before running more +1 …):

==> chcp 65001
Active code page: 65001

==> more +1 utf8.bat > utf8noBOM.bat

==> utf8noBOM.bat

T:\: => D:\bat\Unusual Names\Türkçe (Türkiye)\çğüşöıĞÜİŞÇÖ
t:\ĞÜİŞÇÖçğüşöı.txt

  works as well for characters from Unicode Basic Multilingual Plane
T:\: => D:\bat\Unusual Names\CJK\中文(繁體)
t:\chinese traditional.txt

  works even for characters from Unicode Supplementary Multilingual Plane
T:\: => D:\bat\Unusual Names\
t:\Mathematical Bold Script.txt

Active code page: 65001

==>

JosefZ

Posted 2016-08-26T21:30:58.700

Reputation: 9 121

Can the UTF-8 code page identifier (65001) be different on other computers?

Answers