How to find files with given character encoding?

9

2

I am using Windows XP. I am looking for a tool that for given directory will find all files having particular character encoding (like UTF-8). Do you know such a tool?

Dawid Ohia

Posted 2010-11-09T14:19:56.137

Reputation: 494

3There's no completely reliable way to detect any file's encoding in the first place. – Ignacio Vazquez-Abrams – 2010-11-09T15:01:49.217

Answers

6

This tool works great. Check it out. It shows all files and their encodings for a folder.

http://encodingchecker.codeplex.com/releases/view/59420

There is also this, for bulk changing files to UTF8.

http://www.rotatingscrew.com/utfcast.aspx

mike nelson

Posted 2010-11-09T14:19:56.137

Reputation: 171

3

In general this is not possible - apart from the special case of UTF-8 text files with a Byte Order Mark. Since the name of the encoding is not stored in the text file the only way to tell, for example, CP437 from CP850 would be to make a guess based on a statistical analysis of the whole file, looking at frequency of certain character pairs etc.

Solaris users have auto_ef but, so far as I know, there isn't a Windows port.

Perl users have Encode::Guess

According to Wikipedia "The newer versions of the unix File command attempt to do a basic detection of character encoding. (also available on cygwin and mac)"

None of the above will be 100% reliable. If your files are definitely all in one of a handful of known encodings you may be able to do better.

RedGrittyBrick

Posted 2010-11-09T14:19:56.137

Reputation: 70 632

1

Under Windows this is possible by searching for the right Byte Order Mark (BOM), on the condition that the files were created with a BOM.

You would need a search program for that.
One possibility may be Grep for Windows and search using the beginning of file operator (^^).

harrymc

Posted 2010-11-09T14:19:56.137

Reputation: 306 093