How can I stop Excel from eating my delicious CSV files and excreting useless data?

129

25

I have a database which tracks sales of widgets by serial number. Users enter purchaser data and quantity, and scan each widget into a custom client program. They then finalize the order. This all works flawlessly.

Some customers want an Excel-compatible spreadsheet of the widgets they have purchased. We generate this with a PHP script which queries the database and outputs the result as a CSV with the store name and associated data. This works perfectly well too.

When opened in a text editor such as Notepad or vi, the file looks like this:

"Account Number","Store Name","S1","S2","S3","Widget Type","Date"
"4173","SpeedyCorp","268435459705526269","","268435459705526269","848 Model Widget","2011-01-17"

As you can see, the serial numbers are present (in this case twice, not all secondary serials are the same) and are long strings of numbers. When this file is opened in Excel, the result becomes:

Account Number  Store Name  S1  S2  S3  Widget Type Date 
4173    SpeedyCorp  2.68435E+17     2.68435E+17 848 Model Widget    2011-01-17

As you may have observed, the serial numbers are enclosed by double quotes. Excel does not seem to respect text qualifiers in .csv files. When importing these files into Access, we have zero difficulty. When opening them as text, no trouble at all. But Excel, without fail, converts these files into useless garbage. Trying to instruct end users in the art of opening a CSV file with a non-default application is becoming, shall we say, tiresome. Is there hope? Is there a setting I've been unable to find? This seems to be the case with Excel 2003, 2007, and 2010.

atroon

Posted 2011-01-19T01:08:37.963

Reputation: 2 496

1

Beware that if Excel automatically converts your data into numbers, it will drop any data after 15 digits of precision. I was pulling my hair out over this exact issue, trying to figure out why there were trailing zeroes on every one of my IDs even after setting the correct display and data type. More info: http://superuser.com/a/437794/236877

– Matt Vukas – 2016-08-02T22:13:54.430

46can I give a +1 just for the name? – tombull89 – 2011-01-31T18:57:25.187

11Excel does not seem to respect text qualifiers in .csv files - the double quotes are not text qualifiers, they simply allow commas in your data, if you don't use commas in your data then they are meaningless. All data in a CSV file is untyped, so Excel can only guess that your large serial number is a number, and that's when you run in to Excel's maximum precision of 15 digits, which is what is truncating your numbers. – DMA57361 – 2011-02-01T10:27:13.960

1Excel doesn't seem to respect all commas within double quotes. Consider "August 12, 2012" Excel turns that into garbage too. – zundarz – 2012-06-29T16:02:58.303

5

I want to mention this SU question. It explains which options you have when dealing with CSV in Excel.

– nixda – 2013-01-07T08:50:15.127

1@nixda Thanks! Those are useful suggestions, especially for the more experienced users. My problem is almost more of a human problem, in that Excel associates itself with .csv files, and people see the icon, and double click (because that's how you open things), and then usually hit Save (because we're always telling them to Save!), and all is lost. But I will definitely use your methods when possible. – atroon – 2013-01-07T14:19:17.190

Answers

58

But Excel, without fail, converts these files into useless garbage.

Excel is useless garbage.

Solution

I would be a little surprised if any client wanting your data in an Excel format was unable to change the visible formatting on those three columns to "Number" with zero decimal places or to "text." But let's assume that a short how-to document is out of the question.

Your options are:

  1. Toss a non numeric, not whitespace character into your serial numbers.
  2. Write out an xls file or xlsx file with some default formatting.
  3. Cheat and output those numbers as formulas ="268435459705526269","",="268435459705526269" (you can also do ="268435459705526269",,="268435459705526269" saving yourself 2 characters). This has the advantage of displaying correctly, and probably being generally useful, but subtly broken (as they are formulas).

Be careful with option 3, because some programs (including Excel & Open Office Calc), will no longer treat commas inside ="" fields as escaped. That means ="abc,xyz" will span two columns and break the import.

Using the format of "=""abc,xy""" solves this problem, but this method still limits you to 255 characters because of Excel's formula length limit.

Tyler

Posted 2011-01-19T01:08:37.963

Reputation: 4 203

Note that Apple Numbers does exactly the same thing. Either they're both useless garbage or it's expected behavior. – Jamon Holmgren – 2014-10-01T20:57:40.950

I could not make much sense out of all these solutions provided here. I have a csv file with this sample date - User id,mobile,email 5868127854175771341,,test@gmail.com and I want to convert it to xlsx. I tried several ways - like #1 - opening a new formatting column to text before pasting the data. #2 converting csv to xlsx. In case #1, data shows ok when viewing xlsx file but when I feed it my application, it reads as 5868127854175770000. In case #2, it shows in xlsx also like that 5868127854175770000. – Sandeepan Nath – 2018-06-25T12:07:51.307

1Actually, that's not being harsh. Copy and paste one of the numbers above into Excel, then change the number format as suggested. Excel changes the value, resulting in garbage. – Joe Internet – 2011-01-19T02:03:58.897

1@Joe, I was too cursory on my initial overview. Excel is indeed producing garbage, and is itself garbage. I've updated my answer to reflect that. An option might be having an "Excel csv" and also having a "usable, worthwhile csv" – Tyler – 2011-01-19T02:39:09.457

2@Tyler - I don't think Excel is garbage, just saying that OP was correct that it was producing garbage in this case. It's actually a very good question, with no seemingly elegant solution. – Joe Internet – 2011-01-19T03:50:03.250

1The Format Cells... option has been suggested, and I have tried to use it. In this case, the moment you open the file, Excel seems to convert the serials to Scientific notation (agreed, not unexpected) and tosses the precision. When you change them to a number or to text, the string does not come back. That really is the essence of the problem. Outputting as formulas may do it though...I didn't think of that. – atroon – 2011-01-19T15:17:18.340

Do we need the negativity? Excel has a well documented (See Wikipedia and "Specifications and Limits" for 2003, 2007, 2010) precision of 15 digits, the serials are 18 digits long and CSV data is untyped. This behaviour is expected.

– DMA57361 – 2011-02-01T10:34:16.807

9@DMA57361 The behavior is not expected, it is determinable. The numeric precision is well documented, how excel reads CSVs is not. The lack of warning and silently discarding data is absurd. The fact that you cant even tell Excel how to import the data is equally absurd. Is the negativity needed? No, but honesty is the best policy and that is how I feel. – Tyler – 2011-02-03T05:27:37.640

1This also worked for an id number that had a dash. If the value was a valid date, then Excel would convert it to a date field. For example, 5-1 would convert to May-1. Adding double quotes and a space, i.e. " 5-1", showed up in Excel as a text field of 5-1. – Sam Tyson – 2012-06-25T17:42:34.107

Dates are a different matter. Excel turns both "August 12, 2012" and ="August 12, 2012" into 12-Aug-12 or some other predetermined format even though I'm using double quotes to delimit a text value with a comma. – zundarz – 2012-06-29T15:57:36.530

42

We had a similar problem where we had CSV files with columns containing ranges such as 3-5 and Excel would always convert them to dates e.g. 3-5 would be 3 Mar, after which switching back to numeric gave us a useless date integer. We got around it by

  1. Renaming the CSV to TXT extension
  2. Then when we opened it in Excel, this would kick in the text import wizard
  3. In Step 3 of 3 in the wizard we told it the columns in question were text and they imported properly.

You could do the same here I would think.

text import wizard

Cheers

user65525

Posted 2011-01-19T01:08:37.963

Reputation: 421

1Text Import Wizard is THE answer. All other solutions are needless hackery resulting from not understanding how to use Excel to view and edit CSVs. – Excellll – 2014-10-28T22:27:58.763

1@Excellll, doing it one file at a time sure. When automating this process, 'needless hackery' often saves the day. – Parrish Husband – 2015-03-24T18:07:46.030

1this is completely useless when excel is used by standard users to display CSV files. before i try to explain how to use the text-import-wizard to ~15 beginner-skill office users, i'd rather type the excel document source code by myself. – northkildonan – 2015-11-30T08:21:04.507

2+1 for being the correct way to do it. (edit: sorry had to edit a little to clarify solution) – jay – 2012-02-07T22:41:35.010

2You don't have to rename your file. Just use the import wizard Shift-select all columns and choose as text. – nixda – 2013-01-07T08:58:07.920

8

Better solution is to generate XML Workbook. Like this:

<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
  </OfficeDocumentSettings>

  <ss:Worksheet ss:Name="Sheet 1">
    <Table>
    <Column ss:Width="100"/>
    <Column ss:Width="100"/>
    <Column ss:Width="150"/>
    <Column ss:Width="150"/>
    <Column ss:Width="150"/>
    <Column ss:Width="150"/>
    <Column ss:Width="80"/>
    <Column/>

    <Row>
      <Cell><Data ss:Type="String">Account Number</Data></Cell>
      <Cell><Data ss:Type="String">Store Name</Data></Cell>
      <Cell><Data ss:Type="String">S1</Data></Cell>
      <Cell><Data ss:Type="String">S2</Data></Cell>
      <Cell><Data ss:Type="String">S3</Data></Cell>
      <Cell><Data ss:Type="String">Widget Type</Data></Cell>
      <Cell><Data ss:Type="String">Date</Data></Cell>
    </Row>

    <Row>
      <Cell><Data ss:Type="String">4173</Data></Cell>
      <Cell><Data ss:Type="String">SpeedyCorp</Data></Cell>
      <Cell><Data ss:Type="String">268435459705526269</Data></Cell>
      <Cell><Data ss:Type="String">x</Data></Cell>
      <Cell><Data ss:Type="String">268435459705526269</Data></Cell>
      <Cell><Data ss:Type="String">848 Model Widget</Data></Cell>
      <Cell><Data ss:Type="String">2011-01-17</Data></Cell>
    </Row>


    </Table>
    <x:WorksheetOptions/>
  </ss:Worksheet>
</Workbook>

The file must have .xml extension. Excel and OpenOffice open it correctly.

twentybeersreserved

Posted 2011-01-19T01:08:37.963

Reputation: 104

Do you mean that the OP should use PHP script to convert the database into XML format ? – Prasanna – 2015-09-30T07:23:29.517

2Very much cleaner than having users open .csv in Excel or messing up your CSV so that only Excel can understand your CSV. It’s not even that complex of a schema. – binki – 2016-05-10T15:06:54.113

Where is this standard documented? I would like to know more about the data types available. – John Doherty – 2017-10-07T11:49:16.030

1

My solution: I've got the same issue with importing serial numbers. They don't have to be treated as numbers, ie no mathematical functions are performed on it, but we need the entire number in there. The simplest thing I have is to insert a space in the serial number. eg "12345678 90123456 1234". When Excel imports it, it will be treated as text instead of a numeric.

Peterlip

Posted 2011-01-19T01:08:37.963

Reputation: 19

0

The Import Wizard is the best solution for casual users and one-off situations. If you need a programmatic solution, you can use the QueryTables.Add method (which is what the Import Wizard is using behind the scenes).

Workbooks.Add
With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & "C:\myfile.csv", Destination:=Range("$A$1"))
    .FieldNames = True
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .TextFilePromptOnRefresh = False
    .TextFilePlatform = 437
    .TextFileStartRow = 1
    .TextFileParseType = xlDelimited
    .TextFileTextQualifier = xlTextQualifierDoubleQuote
    .TextFileConsecutiveDelimiter = False
    .TextFileTabDelimiter = False
    .TextFileSemicolonDelimiter = False
    .TextFileCommaDelimiter = True
    .TextFileSpaceDelimiter = False
    .TextFileColumnDataTypes = Array(1, 2, 2) 'Edit this line. Add a number for each column, 1 is general, 2 is text. Search the internet for other formats.
    .TextFileTrailingMinusNumbers = True
    .Refresh BackgroundQuery:=False
End With

PBeezy

Posted 2011-01-19T01:08:37.963

Reputation: 208

0

I had long account numbers being garbled.

This is how I fixed it:

Open your file.csv in Libre Office/Open Office (you may have to specify delimiters etc.) then save the file as an Excel XML file.

Then open this file in Excel and then you will see columns are no longer changed into scientific format or whatever. To be safe right-click the column and explicitly set the format as Text, then save as Excel file format.

Open the Excel format file and the column should still be ok!

user127379

Posted 2011-01-19T01:08:37.963

Reputation: 179

1While this would work, trying to explain to someone who speaks only broken English why he would need to use a different office suite creates just as many problems as it solves. Alternative software to M$ Office is all to the good in my own opinion, but I realize that I can't convert everyone. – atroon – 2013-05-17T18:31:46.130