How can I open a subset of a large (35MB) .xlsx file?

1

I have Ubuntu 10.04 running on a Dell Optiplex with 4GB of memory and two 3.16GHz processors.

I received a 35MB spreadsheet. It opened in Gnumeric after 5 minutes with errors, and it hasn't opened in Open Office (killed after 20 min.) even after I gave the "soffice" process top priority (niceness = -20).

What is the best way to deal with such a file? Is it possible to extract a subset of the first few hundred lines so that I can work out the script that I will need to use to parse the entire file?

update:

The command line function ssconvert BigFile.xlsx BigFile.csv produced the same errors as Gnumeric (unsurprising because Gnumeric uses ssconvert)

David LeBauer

Posted 2011-04-26T18:52:23.033

Reputation: 700

Answers

1

Probably but you'll need some manual work.

xslx files are in fact ZIP files with XML data in them. SO just unpack the file and have a look inside. The format isn't something a sane mind will easily understand but it should be possible to open the sheet files, look for the Row elements and strip everything after the first few hundred.

Alternatively, you can try to open the file with Apache POI; just give Java 1GB of RAM and it might work.

Aaron Digulla

Posted 2011-04-26T18:52:23.033

Reputation: 6 035