1
Someone sent me a text file. Although I can read most of the document, sometimes there are unusual characters. When I open it in VIM, I see <92> in it's place. When I use gedit, i see a character that looks like a square with two zeros and 9 and 4 in the square.
Is there a way to decode these funny characters back to their human readable equivalent?
I also ran the following in shell:
johncomputer> file --mime-encoding file.txt
johncomputer> file.txt: : utf-8
SO i think it's utf8 encoded.
Oh and also, this is a text document where most characters are read-able. Just some (not all) of the accented characters are showing up weird.
Do you know what encoding was used to save the text file? – xxbbcc – 2013-05-10T16:29:55.017
I think it is utf8 – John – 2013-05-10T16:34:48.890
You might want to look at the first and the last words in your txt file. There might be some hints as to what file type it is. For instance, png files will have something like
‰PNG
at the beginning, a jpeg file I opened hasÿØÿà JFIF
at the beginning, etc. – Jerry – 2013-05-10T16:35:24.887If you think so, try using a different editor - Notepad++ or Programmer's Notepad on Windows (I don't know VIM/Linux). If you're sure this is a text file (not some other file format) and the encoding is UTF-8, one of those should be able to show the content correctly. Be aware, that even then, there may be certain characters that cannot be shown and the font used by the editor may also limit what characters can be rendered on the screen. This is typically a limitation of console windows. – xxbbcc – 2013-05-10T16:36:33.327
If you see
<92>
, it's most certainly not UTF-8. – user1686 – 2013-05-10T20:44:56.477