True difference between Excel CSV and Standard CSV

16

4

What is the true difference between an Excel CSV and a standard CSV?

For example, when handling columns with line breaks inside one cell, how do they encode it differently?

user157195

Posted 2011-06-26T09:05:23.823

Reputation: 377

Answers

19

It absolutely depends on what you define as "standard" CSV. As far as I'm concerned, Excel follows the rules outlined in RFC 4180, the "Common Format and MIME Type for CSV Files".

Consider a table where the first cell in the first row has two line breaks. In Excel, it would look like the following:

+---+---------------+------------+------------+
|   | A             | B          | C          |
+---+---------------+------------+------------+
| 1 | col1, line1a  |            |            |
|   | col1, line1b  |            |            |
|   | col1, line1c  | col2, row1 | col3, row1 |
| 2 | col1, row2    | col2, row2 | col3, row2 |
+---+---------------+------------+------------+

Now, how would Excel export this? Let's see – a text editor would display this:

"col1, line1a
col1, line1b
col1, line1c","col2, row1","col3, row1"
"col1, row2","col2, row2","col3, row2"

Not very sophisticated. It inserts a carriage return (hex 0D) where the line break was in our cell. Every cell is surrounded by double quotes. Also, the actual rows are separated with a carriage return.

In order to parse this correctly, a CSV parser would need to

  • ignore that carriage return when it appears within double quotes (i.e. a cell)
  • not ignore that carriage return when it appears outside of double quotes

If it didn't do that, you'd end up with something garbled like this – note that there are now four lines instead of two, because it failed to ignore the line breaks.

+---+---------------+------------+------------+
|   | A             | B          | C          |
+---+---------------+------------+------------+
| 1 | "col1, line1a |            |            |
| 2 | col1, line1b  |            |            |
| 3 | col1, line1c" | col2, row1 | col3, row1 |
| 4 | col1, row2    | col2, row2 | col3, row2 |
+---+---------------+------------+------------+

But, let's see what the RFC says, maybe Excel did it right?

Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.

Neat, that's exactly what Excel did. So summarizing, Excel seems to follow the recommendations of a "standard" CSV file. Given a proper CSV parser, it should be able to read Excel CSV files as well.

slhck

Posted 2011-06-26T09:05:23.823

Reputation: 182 472

@shhck: why do I see csv parser implementations separating "excel" csv with "csv"? I also found excel to be a pretty robust implementation. – user157195 – 2011-06-26T09:47:10.917

@user157195 Depends on the parser, you should read up in its documentation why there is a difference between Excel and non-Excel input. I wouldn't see why there should be one. I only know this from R, which treats Excel CSV just like every other. It can however read XLS too.

– slhck – 2011-06-26T09:50:27.780

2As someone who has been programatically reading excel-generated CSV files for years in all shapes and sizes, I can attest that slhck is 100% correct - Excel produces 100% "standard" CSV files. Every single time. – Mark Henderson – 2011-06-26T10:09:03.570

@Farseeker Thanks for the confirmation! (and you have 1337 rep now!)

– slhck – 2011-06-26T10:10:07.730

haha yeah I've had 1337 rep for quite a while. I don't come past here very often. – Mark Henderson – 2011-06-26T10:13:16.247

5Caution! In countries that use ',' as a decimal point Excel will be "very very clever" and use a ';' separator character. This saves Zee Germanz from seeing too many "quotes";"in";"their";"files". Unfortunately if they then send it to the UK office it will get AllSmooshedUpIntoASingleCellOnEveryLine. Brilliant. – Luke Usherwood – 2013-10-30T17:36:53.517

-1

My understanding from the original csv implementation was that all text fields were enclosed in quotes and numbers didn't need to be. Excel doesn't do this and if you try to upload your Excel generated csv files on non-microsoft platforms they fail. This is a standard microsoft approach, ignore interoperability and focus on blindly following the rules. A similar thing happened with IE, they followed the html/css rules correctly for the padding and margin attributes and ignored the fact that every existing web page and web browser treated them differently. The result, almost every web page now has special rules for IE. I find the csv files produced by excel useless and use a column of "concatenate" functions to manually create them myself properly.

Edwin

Posted 2011-06-26T09:05:23.823

Reputation: 1