iconv on cygwin saves the accents

2

0

Anyone knows why iconv saves the accents on cygwin? And if so how can I tell it not to.

[Nifle@cygwin ~]$ echo "ÅÄÖÕŨÉÁ" | iconv -f utf-8 -t ascii//TRANSLIT
A"A"O~O~U'E'A

I want it to behave as it does on my linux servers

[NIfle@linux ~]$ echo "ÅÄÖÕŨÉÁ" | iconv -f utf-8 -t ascii//TRANSLIT
AAOOUEA

Nifle

Posted 2016-09-27T14:42:47.267

Reputation: 31 337

As a workaround you may want to pipe it through some perl magic: echo "ÅÄÖÕŨÉÁ" | iconv -f utf-8 -t ascii//TRANSLIT | perl -ne 'foreach (split //) { print "$_" if /\w/; } print "\n";' which suppresses all non-word characters (\w). – mpy – 2017-01-04T18:25:52.743

That wouldn't work because the OP might try to parse something containing the string "Hello" which would be mistakenly converted to Hello – Pat – 2017-01-05T16:57:05.847

Answers

4

old issue:

read Kobylkin's comments on this thread...

https://sourceware.org/bugzilla/show_bug.cgi?id=2872#c2

also

http://www.yqcomputer.com/422_10096_1.htm

Remember Cygwin < 1.7 does not have locales.

But in your case the local seems to be well interpreted; it is the iconv Transliteration process on Cygwin the one that decides to convert a single character into 2 (or even more) components instead of one; in your example you get the transliterated character itself plus the transliterated character modifier (accent, dieresis, tilde, dot, etc.) if present.

Å   Ä   Ö   Õ   Ũ   É   Á
A  "A  "O  ~O  ~U  'E  'A

Transliteration:
When a character cannot be represented in the target character set, 
it can be approximated through one or several similarly looking characters
(https://www.gnu.org/software/libiconv/)

you can see the libiconv.dll translit.def showing the source of the transliteration table (translit.h) created during the "make" process. That table shows that what Cygwin does is in fact correct.

Pat

Posted 2016-09-27T14:42:47.267

Reputation: 2 593

0

The conversion tables which Cygwin's implementation of iconv uses are different from those on your Linux server.

If you change the locale for Cygwin to the locale of the linux server, you should get the same conversion result.

emk2203

Posted 2016-09-27T14:42:47.267

Reputation: 594

-1

Cygwin might not be using UTF-8 so is attempting to display one two-byte UTF-8 character as two one-byte ASCII characters.

Do the following:

  1. Go to the menu (if you don't see any menu, right click on your Terminal).
  2. Click Options....
  3. Click Text.
  4. Set Locale and Character set

As I don't have Cygwin installed, I cannot tell what the right values are.

See also the Cygwin article Internationalization - Setup locale.

harrymc

Posted 2016-09-27T14:42:47.267

Reputation: 306 093

Could the downvoter explain why? – harrymc – 2017-01-05T06:22:48.870