Find all UTF-16 encoded files on Windows



Is there a tool available for Windows (command line, gui, script, etc.) that can recurse a directory and identify all files encoded as UTF-16?

Mark Richman

Posted 2011-05-04T15:34:33.893

Reputation: 252

Generally speaking there is no way to automatically and without error detect the encoding of a text file. Having said that: if the content is actually just characters from the ASCII range (or mostly from that range) then checking for files where every second byte is 0 is a good start. – Joachim Sauer – 2011-05-04T15:37:08.300

1@Joachim: I think for large enough files the detection errors should be negligible. Something like "Bush hid the facts" becomes exponentially impossible once the character count is large enough. – Philipp – 2011-05-04T16:10:38.340

1@Philipp: I didn't even know of this particular case. Thanks! But the amount of errors to expect depends a lot on the actual content of your files: if it's all basically english ASCII text, then the detection rate will be pretty good (perfect or near perfect, I'd guess). But if you have UTF-16 encoded Chinese, Arabic, Swahili and Hindu texts in addition to lots of binary data, then it will be much worse. – Joachim Sauer – 2011-05-04T16:16:51.743

2@Joachim: Agreed. From my experience, UTF-16 files without BOM are often generated by Windows system tools (installer scripts, maybe the registry editor) because such a file is essentially a memory dump of an UTF-16 string. Such files often contain lots of ASCII markup that makes them simple to identify. The other way round (deciding whether a valid UTF-16 file is in fact UTF-16) is much harder, of course. Maybe one could test whether large portions of the file belong to a single script, whether UTF-16 CRLF sequences occur, etc. – Philipp – 2011-05-04T17:11:39.583



This tool allows you to detect the file encoding type given standard information such as search pattern and file path:

File Encoding Checker

File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify.

enter image description here

I have not used it myself, so you may want to check it out.


Posted 2011-05-04T15:34:33.893

Reputation: 148


For UTF-16 files with BOM - PowerShell command

gci . -Include *.txt -Recurse | `
% { $c = gc $_.FullName -TotalCount 2 -Encoding Byte; `
    if ( $c.Length -gt 0 -and `
         (($c[0] -eq 255 -and $c[1] -eq 254) -or `
         ($c[0] -eq 254 -and $c[1] -eq 255)) `
        ) {$_.FullName} `

Dmitry Sokolov

Posted 2011-05-04T15:34:33.893

Reputation: 268


A slow way would be to take any conversion utility and run it against all files in a directory. Those files converted successfully from UTF-16 to another format are most likely the ones you need. For that task you can pick an available tool like Character Set Converter.

Or you can write such tool using C++ code snippet from this article Conversion between Unicode UTF-16 and UTF-8 in C++/Win32. Custom tool may be optimized to give up on first conversion error and not saving converted buffer into a file.


Posted 2011-05-04T15:34:33.893



It would not be hard to make one, read the first two bytes of every file and see if they are set to FF FE (windows) respectively.


Posted 2011-05-04T15:34:33.893

Reputation: 1 126