Unicode-based POSIX tools (other than Cygwin) for Windows?

3

GNU Win32 and MSYS are awesome, but it seems like all of their tools use the ANSI version of the Windows API, rather than the Unicode versions.

Obviously, those are meant for Windows 95 and 98, not Windows XP and Windows 7... and they cause lots of problems with any atypical file names, strings, etc.

Is there any similar POSIX toolset (other than Cygwin) that uses the Unicode versions of the Windows API (and which thus supports Unicode)?

user541686

Posted 2012-01-03T03:22:47.053

Reputation: 21 330

Why exclude Cygwin? Also, note that MSYS is a fork of Cygwin. – ak2 – 2012-01-03T06:02:21.797

For clarity, could you give an example of an atypical file name and an atypical string and the problem each causes. – RedGrittyBrick – 2012-01-03T10:24:53.953

@ak2: I'm excluding Cygwin because I'm not looking for Cygwin. The fact that MSYS is a fork of Cygwin doesn't really bother me. – user541686 – 2012-01-03T16:11:54.653

@RedGrittyBrick: If you name a file ╧.txt and then do ls in MSYS, it says ls: -.txt: No such file or directory – user541686 – 2012-01-03T16:18:52.487

1I just ran touch ╧.txt in a Cygwin bash shell, and it shows up with the correct name in Cygwin ls under xterm, in Windows command prompt, and in Windows Explorer. (It doesn't look right under the default Cygwin bash shell, which uses the same terminal emulator as the Windows command prompt.) It might help to know just why you're "not looking for Cygwin". – Keith Thompson – 2012-01-03T21:46:44.440

@KeithThompson: One reason (of many) is that it's ridiculously slow. But I'd rather not go in that direction... I'm just avoiding Cygwin because it doesn't suit my needs. No need for more info. – user541686 – 2012-01-03T22:13:30.423

@Mehrdad: My point is that knowing why you want to avoid Cygwin could tell us something about what your needs are. As far as I can tell, Cygwin does suit your needs as you've stated them so far. (I haven't noticed that it's particularly slow, but I haven't done any performance-sensitive work in Cygwin.) – Keith Thompson – 2012-01-03T22:21:57.003

@KeithThompson: As far as the question goes, it's quite simple: my needs are a Unicode-supporting POSIX toolset (like MinGW, UWIN, GnuWin32, or whatever) that is not Cygwin. I'm pretty darn sure that this is crystal clear. I'm not going to continue discussing the intricacies of why I don't/can't use Cygwin, so if you don't continue asking about this, I'd really appreciate it. Thanks! – user541686 – 2012-01-03T22:42:36.533

@Mehrdad: If you don't want to provide information, I'll be glad not to try to help you. – Keith Thompson – 2012-01-03T22:49:51.403

1@KeithThompson: Well, you can treat it, for all intents and purposes, as though my employer has banned Cygwin at work and forbidden my discussing why I can't use it. I don't want to fight him about it. If you still can't help, then that's totally fine; thanks for trying anyhow. – user541686 – 2012-01-03T22:52:40.670

1@Mehrdad: If you can't provide information, that's a different story. That wasn't clear (to me) from the original question, or from your previous comments. I suggest that something like "I can't use Cygwin for reasons I can't disclose" in the question would have been useful. To clarify, MSYS would be ok if it worked? – Keith Thompson – 2012-01-03T22:59:04.993

@KeithThompson: Yes, MinGW/MSYS and GnuWin32 (and probably others) would be fine if they worked. – user541686 – 2012-01-03T23:05:13.327

Answers

2

Microsoft's very own Subsystem for Unix Applications (SUA). Only available with Windows Ultimate and Enterprise though.

Correction: SUA does not have Unicode support. According to its locale -a command, it only supports ISO-8859-1, EUC-JP, and SJIS.

On a related note, MKS Toolkit, which is another Unix-like environment for Windows, also does not support Unicode in filenames, according to its unicode.5 manpage.

MKS Toolkit utilities cannot handle non-OEM characters in file names unless the locale supports double byte character (such as the Japanese locale). Consequently, even though the utilities support UTF-8 and Unicode characters in files on all platforms, to achieve maximum portability across all Windows platforms, all file names used in scripts for utilities like awk, sh, csh and others should contain only ASCII characters from the OEM code page.

ak2

Posted 2012-01-03T03:22:47.053

Reputation: 3 387

Yea I already knew it existed but didn't even try to see if it's Unicode because I don't have Ultimate/Enterprise. +1 anyway though. – user541686 – 2012-01-03T16:15:49.693

Actually, are you *sure* that it supports Unicode? I just tried ls on the Windows XP version on a file with the name and got back ls: -: No such file or directory. – user541686 – 2012-01-03T16:29:34.643

Nope, looks like you're wrong. I just tried the Windows 8 version, and ls gave back - as the file name (but no error this time). I piped the output to a text file and then viewed the contents, and verified it was not a console displaying issue. – user541686 – 2012-01-03T16:35:25.690

Sorry, I misremembered. SUA does use the Unicode private use area to represent characters such as ':' that aren't allowed in Windows filenames, in the same way as Cygwin, so I guess I wrongly implied that it has general Unicode support. – ak2 – 2012-01-03T21:17:17.187

Remember that the Windows console (the equivalent of terminal window) has some trouble supporting Unicode -- it starts in a OEM codepage by default, and switching to UTF-16 or UTF-8 (chcp 65000 or 65001) breaks stuff. It's not always the fault of the program, even if it uses Unicode APIs it can't always display Unicode characters to the console. Use some executable inspector to verify what kind of APIs the program uses. – user1686 – 2012-01-03T23:33:30.890

@grawity: Like I explained, that's *exactly* why I also piped the output to another file and then inspected the result over there... – user541686 – 2012-01-04T03:29:25.123

0

I would suggest you to try UTF-8 Cygwin(http://www.oki-osk.jp/esc/utf8-cygwin/) for utf8 compatibility.

I was using DeltaCopy which is based on cygwin(It comes with cygwin1.dll when installed). It was having problem dealing Chinese file name(multibyte character in filename&folder name)

After I replace the cygwin1.dll file to "UTF-8 Cygwin" version. All Unicode file I sync to server are correctly uploaded to server.

This is a very simple solution as the "UTF-8 Cygwin" stats that it deal with unicode character but still keeping binary-compatibility with the current Cygwin.

Jason Chiang

Posted 2012-01-03T03:22:47.053

Reputation: 41