Special characters not recognize from Google Sheet & Microsoft Excel

0

I have developed a web crawler to extract some information and print them to an Excel spreadsheet.

However, neither Excel nor Google Sheet recognize some special characters, see below:

Special characters

The text should be the following: ("Woodland"​ or the "Company"​) This is just a case, there are several more special characters not recognized.

Do you know how I can convert them? Do I have to turn on some feature on Excel and Google Sheet?

I have been stuck on it for days - any help is welcomed.

Thank you!!

AndrewTG

Posted 2019-08-02T21:14:28.803

Reputation: 1

1It would help if you explain how you get this text and how you store it in Excel. – Blackwood – 2019-08-03T02:25:43.797

Answers

0

What is the crawler written in? The easiest option would probably be to have the parser take out special characters before output.

That said, how are you outputting data to the spreadsheets themselves? I think we need more info to help you here.

Alternatively you could use something like this https://exceljet.net/formula/remove-unwanted-characters

ColonelMeow

Posted 2019-08-02T21:14:28.803

Reputation: 26

0

Excel does support Unicode. Your (unspecified) method and encoding is to blame.

This looks like you are retrieving data from the Web in UTF-8 format, then importing it to Excel without specifying the UTF-8 encoding, so it thinks it is reading ANSI text. The result is that special characters in UTF-8 that take two bytes are displayed as two weird characters.

If you are creating an input file for Excel, you could preface it with a Byte order mark (BOM). The UTF-8 representation of the BOM is the (hexadecimal) byte sequence 0xEF,0xBB,0xBF.

harrymc

Posted 2019-08-02T21:14:28.803

Reputation: 306 093