Unicode (non-ascii) filename interoperability between linux and windows

3

3

I have this problem, and I'm not really sure where the issue is on client or on server or both. I'll appreciate any help diagnosing and resolving this.

I have remote linux box running debian, that I regularity download files and folders from to a Windows 8 box. Most of the time it just works. I'm using a download manager that allows multi-threaded download to speed up the process.

However in a small number of cases file names on linux have non-ascii characters in them. My download manager (GetRight, rather ancient one) download them mangled. I was thinking that it's GetRight's problem since, in putty when I ssh to the server and in WinSCP the file names look correct and WinSCP download them flawlessly (although, alas, not in multi-thread manner).

But then I tried to connect to the server with ftp.exe under windows, and the file names came up mangled as well.

Now I decided, that I just tar-gzip the files on the server and download them this way. But this did not work either. For example on linux I have:

[jade ~/tmp] ls
тестовый
[jade ~/tmp] tar -czf ../data.tar.gz .
[jade ~/tmp]

Now I download data.tar.gz on Windows and try to unpack it:

E:\!2>7z x data.tar.gz

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18

Processing archive: data.tar.gz

Extracting  data.tar

Everything is Ok

Size:       10240
Compressed: 168

E:\!2>7z l data.tar

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18

Listing archive: data.tar

--
Path = data.tar
Type = tar
Physical Size = 10240
Headers Size = 9728

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2013-01-21 17:00:10 D....            0            0  .
2013-01-21 17:00:10 .....            2          512  .\╤В╨╡╤Б╤В╨╛╨▓╤Л╨╣
------------------- ----- ------------ ------------  ------------------------
                                     2          512  1 files, 1 folders

As you can see even if exclude a transfer agent (such as ftp client/server) from the equation, the problem sill persist.

I'd like to concentrate on the last scenario with tar-gziping on the server and unpacking on the client and make it work.

Can some one explain to me why I'm seeing what I'm seeing? Is it server or the client to blame or both? How to resolve?

I'd like to mention that on window I can have a file with exactly the required filename if I create it myself:

E:\!2>echo a > тестовый


E:\!2>dir т*
 Volume in drive E is Storage
 Volume Serial Number is F41B-FF77

 Directory of E:\!2

21-Jan-13  17:20                 4 тестовый
               1 File(s)              4 bytes
               0 Dir(s)  63,511,015,424 bytes free

E:\!2>

Andrew Savinykh

Posted 2013-01-21T04:26:21.540

Reputation: 1 521

http://superuser.com/a/487289/43300 - use --format=posix – rsk82 – 2015-03-29T13:05:20.840

1Just a guess, but can you make sure all the parts involved are using the same character encoding? The Linux box, the ftp server, the windows box and the client used to download and extract the files? Maybe the Linux box is saving those files using UTF and windows is using something else (not familiar with cyrillic encodings) – Martín Canaval – 2013-01-21T04:36:20.937

Yep, that was my thinking too as far as I can judge it all supposed to be UTF... – Andrew Savinykh – 2013-01-21T04:43:54.730

Answers

2

See this answer for an explanation of what is going on.

I would suggest you use 7zip instead of tar, since 7zip seems to "remember" which encoding was used for the file names and decompress them nicely. I have tested this on Swedish non-ASCII characters, and hopefully it will work for you too.

Christian Davén

Posted 2013-01-21T04:26:21.540

Reputation: 418

Thank you very much it looks like you might be right, I'll have a closer look to confirm. My problem though is that I'm not a root on the nix box, so I can't really install 7zip. I'll ask the provider, but the positive answer is unlikely. – Andrew Savinykh – 2013-05-07T22:47:20.477

Thank you so much, finally I got it working (the provider installed 7zip) and also got understanding why this works that way, thank you again. – Andrew Savinykh – 2013-05-08T04:31:58.227