convert file type to utf-8 on unix - iconv is failing

6

2

Possible Duplicates:
Batch-convert files for encoding or line ending under Windows
How can I convert multiple files to UTF-8 encoding using *nix command line tools?

I've got a php file on my windows machine that upon moving over to *nix with winSCP, is not showing the characters correctly.

I've dragged the file back from the linux machine down to windows and checked the encoding with Notepad++, and it says it ANSI.

So i tried iconv -f ANSI -t utf-8 filename.php>filename.php, but get an error that ANSI conversion is not supported. I've also tried MS_ANSI, and I get no error, but I also don't get the file showing the proper encoding.

I open the file with winSCP to see how it looks, and many special characters show up as '?'. Seeing as the purpose of the script is to remove these special characters from my data, it is really causing a bit of an issue.

Is there another tool for changing the encoding? I tried yum iconv, but get a no package available response.

How would you convert this file to the proper encoding?

pedalpete

Posted 2009-08-24T05:15:58.980

Reputation: 293

Question was closed 2010-02-25T18:32:54.230

Answers

5

I have similar troubles with MD5 hashes created on WindowsXP (under Cygwin), saved to a file, then copied to a Linux system where the hashes are computed for copy verification. If the name of a file being hashed contains non-ASCII characters, md5sum reports the file missing, because it's not decoding the filename correctly. However, if I open the textfile containing the hashes in Notepad and change the encoding from ANSI to UTF-8, the Linux md5sum will get the encoding correct.

ANSI isn't really a proper encoding (to anyone but Microsoft), so that's why iconv isn't picking up on it. You might get away windows-1252 instead, but there's no guarantee it will always work:

iconv -f windows-1252 -t utf-8 filename.from > filename.to

For the record, file gives me this on one of those MD5 textfiles:

$ file tequila.ansi.txt
tequila.ansi.txt: ISO-8859 text

quack quixote

Posted 2009-08-24T05:15:58.980

Reputation: 37 382

1

Are you sure "ANSI" is the correct character encoding/input name for iconv? You could try to run "file filename.php", often file will tell (what it thinks) the encoding is. You could also try to not specify the from encoding when doing the conversion, or you could just try all of them:

for i in `iconv -l`; do iconv -f $i -t utf-8 filename.php > filename.php.$i; done

hlovdal

Posted 2009-08-24T05:15:58.980

Reputation: 2 760

I can't say I'm 'sure' that ANSI is the correct character encoding, but Notepad++ tells me that it is ANSI when I drag it down from linux to pc. going from pc it says it is UTF-8. I have now noticed that if I open the file with winSCP, I get '"Ð¥", "Ц", "Ч", "Ш", "Щ", "Ъ", "Ъ"' - when I should have '"у", "ф", "х", "ц", "ч", "ш", "щ", "ъ", "ы",'. If i close the file and open it again, then I only get '?' instead of any special characters. – pedalpete – 2009-08-24T14:33:31.263

1

You could just convert it to UTF-8 with Notepad++.

Matthew Talbert

Posted 2009-08-24T05:15:58.980

Reputation: 1 131

1

There are several encodings which are called "ANSI" in Windows. In fact, ANSI is a misnomer. iconv has no way of guessing which you want.

The ANSI encoding is the encoding used by the "A" functions in the Windows API (the "W" functions use UTF-16). Which encoding it corresponds to usually depends on your Windows system language. The most common is CP 1252 (also known as Windows-1252). So, when your editor says ANSI, it is meaning "whatever the API functions use as the default ANSI encoding", which is the default non-Unicode encoding used in your system (and thus usually the one which is used for text files).

So, to convert the file correctly, you first should find out which is the "ANSI" encoding for your Windows system (or simply ask your text editor there to save using a specific encoding).

CesarB

Posted 2009-08-24T05:15:58.980

Reputation: 4 480

"Which encoding it corresponds to usually depends on your Windows system language. " The default system locale, actually. – Yuhong Bao – 2011-02-15T01:30:26.677