10
3
I've just realized I have a file on my system; it lists normally:
$ ls -la TΕSТER.txt
-rw-r--r-- 1 user user 8 2013-04-11 18:07 TΕSТER.txt
$ cat TΕSТER.txt
testing
... yet, it crashes a piece of software with a UTF-8/Unicode related error. I was really puzzled, since I couldn't tell why such a file is a problem; and finally I remembered to check the output of ls
with hexdump
:
$ ls TΕSТER.txt
TΕSТER.txt
$ ls TΕSТER.txt | hexdump -C
00000000 54 ce 95 53 d0 a2 45 52 2e 74 78 74 0a |T..S..ER.txt.|
0000000d
... Well, obviously there are some bytes in between/instead of some letters, so I guess it is a Unicode encoding problem. And I can try to echo the bytes back to see what is printed:
$ echo -e "\x54\xCE\x95\x53\xD0\xA2\x45\x52\x2E\x74\x78\x74"
TΕSТER.txt
... but I still cannot tell which - if any - Unicode characters these are.
So is there a command line tool, which I can to inspect a string on the terminal, and get Unicode information about it's characters?
http://sdaaubckp.svn.sourceforge.net/viewvc/sdaaubckp/single-scripts/utfinfo.pl is a dead link – Winny – 2019-03-25T02:50:03.863
1Nice tool, but the downloaded version is missing the
!
of the shebang... – mpy – 2013-04-12T16:26:43.8771Cheer @mpy - fixed now... – sdaau – 2013-04-16T01:19:05.847