excel performance: Lookup Vs Getpivotdata

I build an Excel 2007 spreadsheet which contains a larger table with source data (about 500,000 rows and 10 columns). I need to extract data from this large table for my analysis. To extract and aggregate data I usually use sumif, vlookup/hlookup and index+match functions.

I recently learned about the existence of the function getpivotdata, which makes it possible to extract data from a pivot table. To be able to use it, I first need to convert my large source table to a pivot table and after that I can extract data using the function getpivotdata.

Would you expect a performance improvement if I would use getpivotdata to extract and aggregate data instead? I would expect that within the underlying Pivot object aggregated values are pre-calculated and therefore performance would be better.

If performance would be better, are there any reasons not to follow this approach? To be clear, there is no need to refresh the pivot table because it contains source data (which is located in the beginning of the calculation chain).

Ruut

Posted 2012-10-17T08:50:08.813

Reputation: 341

Answers

I did a few performance tests on a dual core 2.33 GHz 2 GB RAM desktop PC with Excel 2007.

The lookup was done on a table with 241,000 records. The results are (the fastest first and the slowest last):

With the index-match function on an sorted list the number of lookups per seconds was:180,000!! (based on 1,440,000 lookups in 8 seconds). More info on how to implement sorted lookups in Excel can be found here and scroll down to section INDEX-MATCH in One Formula, Sorted Data
With the getpivotdata function the number of lookups per second was:6,000 (based on 250,000 lookups in 40 seconds)
With the getpivotdata function using very flexible single argument string syntax (see here) the number of lookups per second was: 2,000 (based on 250,000 lookups in 145 seconds)
With the index-match function on an unsorted list the number of lookups per seconds was:500 (based on 20,000 lookups in 35 seconds)

The results do not change when the lookup function refers to a Data Table instead of a named range.

So to answer the question. Lookups by getpivotdata are about 10 times as fast as regular index-match lookup, but best performance improvement is achieved by sorting your source data. Sorting your source data could make your lookup 400 times as fast.

Ruut

Posted 2012-10-17T08:50:08.813

Reputation: 341

Doing the lookups with VBA (using a dictionary) is by far the fastest way. See this: https://stackoverflow.com/questions/18656808/how-to-optimize-vlookup-for-high-search-count-alternatives-to-vlookup

jj2j

Posted 2012-10-17T08:50:08.813

Reputation: 11

Using GetPivotData only gives you access to whatever is visible in the PivotTable report. If you are the sole user of this spreadsheet then this may be a viable approach for you.

If you can design the Pivot to do most of your aggregations for you then using GetPivotData will be faster.

I have not tested GetPivotData performance, but I would expect it to be slower than a Binary Search Lookup/Match on sorted data.

Charles Williams

Posted 2012-10-17T08:50:08.813

Reputation:

The pivot would be designed in such a way that it looks identical to the original data, So 500,000 rows (hiding all the subtotals) and putting all 9 dimension in the row section except for the data column, which would be placed in the data section of the pivot report. – None – 2012-10-17T10:22:06.057

2If the Pivot is not doing any of the aggregations for you I would expect GetPivotData to be slower because it has to do more work: but thats only a guess: I have not actually tested it. – None – 2012-10-17T12:02:00.067

I assume the pivot only has to do the aggregations only on refresh and since my base data is fixed I will not hit refresh. I only use the GetPivotData function for lookup purposes. – Ruut – 2012-10-29T09:12:21.427

I have the same issue on a daily basis. Large number of rows in multiple data tables in Excel.

Currently the only solution that makes extremely large tables usable is to export them to a database server and do/write SQL queries to do the Sumif's,Vlookups and aggregation

You can use excel to create the SQL queries

Over the years I have exported sheets/tables to "MySQL" and "MS SQL Server express" and then connect to them with excel and write SQL queries

The server does the Processing faster than excel and if the database is on a different server the performance increases since its not using your PC's resources to do the calculations.

There are other benefits to this solution as well.

Like ETL automation and the Sharing of a connection string rather than a "BIG" spreadsheet.

BOB

Posted 2012-10-17T08:50:08.813

Reputation: 218

In other words: things work better if you use the right tool (the DB) for the right situation (a large amount of data which needs to be queried). – Hennes – 2012-10-17T21:09:10.783

I am very familiar with ODBC links to external databases from Excel. In this case I would like to have a stand alone Excel based system and I am looking for the most efficient solution. – Ruut – 2012-10-29T09:09:31.823