3

I have a unicode file that contains Chinese characters. I have a local and a remote copy of it.

When I use less on the local file the characters are shown properly:

奥尔德林

However, when I ssh to the remote machine and look at the remote version of the same file the characters are just shown like this instead:

<E5><A5><A5><E5><B0><94><E5><BE><B7><E6><9E><97>

How can I properly view the remote unicode file (when connected via ssh)?

(I'm using the standard terminal application on Mac OS.)

user9474
  • 2,368
  • 2
  • 24
  • 26

1 Answers1

3

Does the file display correctly if you cat it (or used head to shorten the output)?

I think the key to making less display the file properly lies in setting LANG or LC_ALL properly. On your local system and the remote system compare the values of those variables and see if they are different and whether changing the remote one to match makes a difference.

From the less man page (Note the last sentence):

       If  neither  LESSCHARSET nor LESSCHARDEF is set, but any of the strings
       "UTF-8", "UTF8", "utf-8" or "utf8" is found in the LC_ALL,  LC_TYPE  or
       LANG environment variables, then the default character set is utf-8.

       If  that  string  is  not found, but your system supports the setlocale
       interface, less will use setlocale  to  determine  the  character  set.
       setlocale  is  controlled  by  setting the LANG or LC_CTYPE environment
       variables.

       Finally, if the setlocale interface is also not available, the  default
       character set is latin1.

       Control  and  binary  characters  are  displayed  in  standout (reverse
       video).  Each such character is displayed in caret notation if possible
       (e.g.  ^A for control-A).  Caret notation is used only if inverting the
       0100 bit results in a normal printable character.  Otherwise, the char‐
       acter  is displayed as a hex number in angle brackets.
Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148