Decode weird characters in text file

Someone sent me a text file. Although I can read most of the document, sometimes there are unusual characters. When I open it in VIM, I see <92> in it's place. When I use gedit, i see a character that looks like a square with two zeros and 9 and 4 in the square.

Is there a way to decode these funny characters back to their human readable equivalent?

I also ran the following in shell:

johncomputer> file --mime-encoding file.txt
johncomputer> file.txt: : utf-8

SO i think it's utf8 encoded.

Oh and also, this is a text document where most characters are read-able. Just some (not all) of the accented characters are showing up weird.

character-encoding

John

Posted 2013-05-10T16:24:59.557

Reputation: 673

Do you know what encoding was used to save the text file? – xxbbcc – 2013-05-10T16:29:55.017

I think it is utf8 – John – 2013-05-10T16:34:48.890

You might want to look at the first and the last words in your txt file. There might be some hints as to what file type it is. For instance, png files will have something like ‰PNG at the beginning, a jpeg file I opened has ÿØÿà JFIF at the beginning, etc. – Jerry – 2013-05-10T16:35:24.887

If you think so, try using a different editor - Notepad++ or Programmer's Notepad on Windows (I don't know VIM/Linux). If you're sure this is a text file (not some other file format) and the encoding is UTF-8, one of those should be able to show the content correctly. Be aware, that even then, there may be certain characters that cannot be shown and the font used by the editor may also limit what characters can be rendered on the screen. This is typically a limitation of console windows. – xxbbcc – 2013-05-10T16:36:33.327

If you see <92>, it's most certainly not UTF-8. – user1686 – 2013-05-10T20:44:56.477

Answers

The odds are that what you see as <92> and <94> are windows-1252 encoded “smart” (curly) apostrophe and “smart” right double quotation mark. They could be just about anything, of course, but in UTF-8, such bytes cannot appear as “standalone”, only as the 2nd or later byte of a multi-byte representation of a character,

Jukka K. Korpela

Posted 2013-05-10T16:24:59.557

Reputation: 4 475

Do you know the codepage used by the person that sent you the file? What is their primary language?

In Vim you can reload the file using another encoding with the command

:e ++enc=cpXXX

Link to relevant vim tip

Jimbo

Posted 2013-05-10T16:24:59.557

Reputation: 21

I don't know how they created this text document. They just emailed it to me. I tried the VIM command, but that didn't seem to affect the document. I sitll see <92> – John – 2013-05-10T16:41:54.443

If the file truly is UTF-8, this command will display it :e ++enc=utf8 a couple other ones to try would be utf16 and ucs2 – Jimbo – 2013-05-10T17:23:12.210