old issue:
read Kobylkin's comments on this thread...
https://sourceware.org/bugzilla/show_bug.cgi?id=2872#c2
also
http://www.yqcomputer.com/422_10096_1.htm
Remember Cygwin < 1.7 does not have locales.
But in your case the local seems to be well interpreted; it is the iconv Transliteration
process on Cygwin the one that decides to convert a single character into 2 (or even more) components instead of one; in your example you get the transliterated character itself plus the transliterated character modifier (accent, dieresis, tilde, dot, etc.) if present.
Å Ä Ö Õ Ũ É Á
A "A "O ~O ~U 'E 'A
Transliteration:
When a character cannot be represented in the target character set,
it can be approximated through one or several similarly looking characters
(https://www.gnu.org/software/libiconv/)
you can see the libiconv.dll
translit.def showing the source of the transliteration table (translit.h) created during the "make" process. That table shows that what Cygwin does is in fact correct.
As a workaround you may want to pipe it through some perl magic:
echo "ÅÄÖÕŨÉÁ" | iconv -f utf-8 -t ascii//TRANSLIT | perl -ne 'foreach (split //) { print "$_" if /\w/; } print "\n";'
which suppresses all non-word characters (\w
). – mpy – 2017-01-04T18:25:52.743That wouldn't work because the OP might try to parse something containing the string
"Hello"
which would be mistakenly converted toHello
– Pat – 2017-01-05T16:57:05.847