Excel 2007 doesn't preserve asian characters when converting to csv

5

2

I am trying to save a spreadsheet as a CSV file. The file consists of korean characters, and the resulting csv has those characters literally converted to question marks (0x3F).

I am running on an English version of Windows but I (should) have the appropriate charsets installed. The default encoding for non-unicode programs is English. I have no problems saving the files in other formats (xls or txt, for example).

This issue occurs for Japanese and Chinese characters as well.

The sample strings I used (in separate files) are

안녕하세요
你好
おはよう

None of files are exported correctly.
Does Excel support asian characters when saving as CSV?

UPDATE

Decided to do some testing. Instead of trying to export from xls to csv, I instead created a csv manually that contains the following chinese characters: 你好. I am using Notepad++ to save the files. The purpose of this is to test whether excel can actually save CSV files with certain characters properly.

First, I encoded the file in UTF8 without BOM with .csv extension. I opened the file in Excel, and it rendered the characters as ASCII (therefore, incorrectly. Looks something like 你好). However, when I saved the file, the characters were preserved when I opened it in notepad++ in UTF8.

Second, I created a new copy of the test file, but this time encoding it in UTF8 (with the BOM). I opened the file in Excel, and it read the file correctly (as 你好). I then re-saved the file, but this time it converted the characters to two question marks.

I found it interesting that while Excel correctly reads the file, it cannot re-save it correctly, but when it reads it as ASCII, it doesn't try to convert them but instead just outputs them as is. Seems like there is an issue when Excel tries to save unicode characters?

Workaround Solution

I saved the the document as Unicode Text which produces a tab-separated text file and it preserved the asian characters. There should be tools available to convert tab-separated files to comma-separated files.

Because the file is unicode encoded, all of the sample strings I've presented at the top (from three different charsets) appear properly.

MxLDevs

Posted 2013-08-29T18:58:17.653

Reputation: 1 019

Question was closed 2013-08-30T17:20:35.227

1That question involves csv files saved in DOS, which has limited support for unicode already I believe. In my case I am creating the excel files in windows which should have proper support for unicode. – MxLDevs – 2013-08-29T19:07:35.637

But the solution could have worked for your case: from xlsx save as xls and then to csv. – Doktoro Reichard – 2013-08-29T19:36:58.610

I've done some digging. First came this post that lead me to the RFC 4180, that states Common usage of CSV is US-ASCII. Awkwardly I tried opening Excel in Applocale and the results were slightly different, so maybe there is an (undocumented) CSV encoding option hidden inside Excel. On the other hand, saving as a Unicode Text doesn't change the contents, so to convert that to CSV would be to convert the Tabs to ,.

– Doktoro Reichard – 2013-08-29T21:06:03.330

Also there's this tidbit from Microsoft detailing the function Excel uses behind the curtain to save files. Ultimately it seems there exists an optional (and ignored) argument called TextCodepage, that gains its value from the Regional Settings. However, since it defaults to the region codepage, it doesn't default to Unicode.

– Doktoro Reichard – 2013-08-29T22:24:44.317

Thanks for looking into the issue. From the sounds of it, there does not seem to be a way to address this issue with microsoft excel. I'll have to look into alternatives to achieve what I want (and then figure out how to get others to do the same...) – MxLDevs – 2013-08-30T02:27:21.667

Or perhaps I could find a way to have excel output the csv "as-is" without trying to convert the characters since it can't do it correctly anyways. As long as the bytes are the same I could encode the file myself using other tools. – MxLDevs – 2013-08-30T02:39:09.513

I am running Excel 2010 (Japanese version, Japanese OS) and am not encountering the same problem. Mine works fine even if I just save it as a CSV for some reason (it opens and displays fine in notepad). Unfortunately, this means I can't be much help. – jmac – 2013-08-30T03:47:44.593

@jmac Like I said, it's localization's fault. Since you have a Japanese Excel and a Japanese OS Excel saves the CSV on the Japanese codepage, which I assume is JIS or Shift-JIS. Keikoku, the last link I gave does show ways to do that, most notably with Google Spreadsheet and Openoffice (and a Excel VBA macro, tuckered in there) – Doktoro Reichard – 2013-08-30T07:19:13.273

Not sure why this was closed as a duplicate when the linked question doesn't even have a solution that actually works for this problem. I've found a workaround solution for Excel (using the save function) but I'd rather post it as a proper answer. – MxLDevs – 2013-08-30T19:43:58.470

@Keikoku Would you please post your answer to the linked questions? None of the suggested answers on the other question work for me. It would be nice to try your answer. – Jay Askren – 2013-09-04T17:48:07.060

@JayAskren I have posted an answer in the linked question at the top – MxLDevs – 2013-09-04T20:47:47.203

No answers