Spreadsheet application that can handle big data OS X

2

I've been working with Excel for quite a while for some statistical analysis that I do regularly. The size of the data that I'm working with has gotten much larger as of late, however. The layout of the databases in question is quite simple, usually just three rows which includes a UNIX timestamp, and EST value, a proprietary numeric value and finally an average of the rows that have a timestamp +/- 1000 that row's timestamp (little AVERAGEIFS() formula). That formula and the EST conversion are the only formulas in the sheet.

I'm beginning to work with files with 500,000+ rows. Running the average formula down the entire row takes forever. The end result is the production of print-worthy graphs. I'm looking for either a UNIX CL utility or separate spreadsheet/database application that can handle this amount of data without melting my CPU or making me wait an hour. Is there anything out there?

TL;DR: Simple excel sheet with over half a million rows is getting too slow to work with. OS X alternatives?

Peter Kazazes

Posted 2012-07-08T07:34:11.097

Reputation: 205

Your similar question answered in SO: http://stackoverflow.com/a/11387240/1248931

– andy holaday – 2012-07-09T00:28:38.710

Please don't waste our time with questions in two areas. Ask it in one. If that turns out to be the wrong place, flag it and ask a moderator to migrate it. – Julian Knight – 2012-07-09T13:08:18.743

@JulianKnight If you looked at the specifics of either question, they're inherently different. Thank you for the suggestion however. – Peter Kazazes – 2012-07-10T01:09:54.443

Hmm, I did look at the other question but in reality, the answers certainly seem to be related. I just think it would have been better to have just one single question like this answered first and then to follow up with another question if further clarification was required or a slightly different question needed. – Julian Knight – 2012-07-10T14:21:53.367

Answers

1

I would see if you can do this in a traditional database (SQL or similar) or in something like an R console. If the problem is in the plotting, and not the calculation, I would try to do some type of filtering (not sure what kind of data you have, or how it is spread out, but in any case you do not need 500k+ data points on a chart) to get rid of many unneeded data points for the plot. That will make the plotting go quicker (and you can use R or gnuplot for the actual plotting if Excel is too slow)

soandos

Posted 2012-07-08T07:34:11.097

Reputation: 22 744

I've been doing some of the work in R lately because of its efficiency. The only problem that I've had in my search for a replacement for excel is looking for a spreadsheet or database alternative that can handle array formulas. – Peter Kazazes – 2012-07-08T18:50:25.803

1I agree with the answer - you need to switch to a database. Something like MySQL or Postgres would do the job. Of course, this means that you will need to learn some SQL too. The formula you are using can certainly be replaced by some SQL. After you've got set up and had a go, you could always come back and ask a slightly different question about the SQL required! – Julian Knight – 2012-07-09T11:20:11.780

Less likely to be useful but worth mentioning is Google Refine. It handles large data sets easily but is more focussed on data improvement and segmentation. However, it does support Python processing

– Julian Knight – 2012-07-09T11:21:24.900

@JulianKnight, I think google refine is very cool, but this is a bad use case for it (in my opinion). It is meant more for parsing and modifying data than just calculation. – soandos – 2012-07-09T15:14:21.057

@soandos: I agree about Refine, I just wanted to mention in in case it turned out to be useful. I've not Looked at R before, looks interesting, especially the Red-R GUI. I'm also interested now in IPython since I already know Python, this may be a useful tool to me. – Julian Knight – 2012-07-10T14:23:41.167