How to improve performance with large excel files

2

I am working with Excel 2007 on a Windows 7, Core i5 2.53GHz (2 cores, 4 threads), with 4GB DDR3 @ 533MHz RAM.

I need to extract and organise data from a huge database and, to achieve it, I have a worksheet where I:

  1. a. paste part of the raw data into 6 columns x 36,600 rows
    b. extract the data I’m interested in from a) and work on it with 8 columns x 36,600 rows

I then copy the results of the formulas of 6 of the columns in 1.b) and paste-value them into a second worksheet, where I am going to compile all the data I need and analyse it.

  1. a. for each group of 6 cols x 36,600 rows, I will need to add 5 columns of formulas (mostly small additions with no more than 2 values each, and some INDEX-MATCH);
    b. the total amount of data will end up at 130 of these groups, so: 780 columns of data plus 650 columns of formula x 36,600 rows.
    c. once I have compiled and analysed all the data, I will turn it all into values – which I will later use with LOOKUPs/INDEX-MATCHes formulas.
    d. Neither worksheet has automatic formatting, both have minimum formatting (coloured cells to identify those which have formulas), if I click CTRL + END it goes straight to the end of the range in use.

So, my problem is that, as of now, I have built the formula structure to which I will add the data bit by bit, but the file is 166MB and is using up all of my processor’s resources. I have set calculation to manual, but it still takes up to 5 minutes to open, close or save this file.

Questions:

  1. Is this slow-ness expectable with a 166MB file?
  2. Would it work better if I compiled the data as 65 columns of data plus 78 columns with formulas x 367,000 rows?
  3. As I am in the process of acquiring a new laptop, would a better processor overcome the difficulties? (I am thinking on the lines of a QuadCore i7 with 16GB RAM)

NOTE concerning Question 3: I am not looking for hardware recommendations as I am in the process of acquiring the machine described above. I want to know wether I may expect improved performance when working with this file.

SC for reinstatement of Monica

Posted 2016-10-07T15:21:50.110

Reputation: 193

1The slowness is likely because of the number of records + somethng to do with the formula. Since you can't change neither of those you can't change that aspect of the performance. 2) It could you should test it. 3) Your asking for a hardware recomendations. Those are not on topic, – Ramhound – 2016-10-07T17:02:03.093

@ Ramhound: I've added a note to avoid misinterpretation of question 3. Also, I'll invest time on testing the structure change then, thanks. – SC for reinstatement of Monica – 2016-10-07T18:03:27.163

1Are you on a 64 bit OS and do you have a 64 Bit office? – bvaughn – 2016-10-07T18:16:15.057

Answers 1> Depends on what else is going on. Do you have email, web browsers, etc. in either the background or foreground? On a clean system, being carefule to remove formula once done with them, and working in segments, it should not have a really bad slow down – bvaughn – 2016-10-07T18:18:55.510

start="2">

  • After you get source files on system, can you disconnect from network and reboot. No internet, no local net, no wan? The answers is is does not matter as it is the Numbers of unique cells impacts performance. only if total number of cells with different formulas has a large change will layouts make a difference.
  • < – bvaughn – 2016-10-07T18:20:27.523

    #3, yes, a new machine has an excellent chance of starting out faster. – bvaughn – 2016-10-07T18:22:07.950

    @bvaughn: Thanks for your comments. Yes, it's a 64 bit OS and Office. I always have Firefox open with several tabs and my email, plus I need to have an astronomy software open to get the data I need. I'm not sure about what you mean with network - I am working on my laptop and I am not integrated in any intranet network or the likes. – SC for reinstatement of Monica – 2016-10-07T18:51:26.513

    In addition to the previous comments: 166 MB isn't common in Excel, with the relatively lot of functions it can cause the delay, yes. What is the format of your file? Saving as a binary Excel file (.xlsb) may help to decrease size (and improve performance). "Would it work better..." - you can only know if you test it, there aren't exact formulas where you can insert size of range and it calculates performance. – Máté Juhász – 2016-10-07T19:08:38.580

    I need to extract and organise data from a huge database <- why are you not using database software to do most of the processing and then extract the data to Excel? Excel is spreadsheet software and does not handle large amounts of data as efficiently – thilina R – 2016-10-07T19:26:41.993

    Excel is using system resources. If you have a resource problem, it is exacerbated by all the other non-excel items. There are also a lot of other potential issues where the problem is not actually with excel or the file, but is in some other part of the system. (Allegory, a water heater is having issues causing the smoke alarm to keep going off. Home owner blames the alarm and replaces it. Does not really help with the problem with the water heater. ) – bvaughn – 2016-10-07T19:40:58.587

    1"I want to know wether I may expect improved performance when working with this file." - The more cores Excel has access to the faster it will do the computations required for the formula. If its a resource intensive task, more memory can also help. A SSD can also help in many cases, much of the benefits of a SSD, can be reproduced with a small RAMDrive though. – Ramhound – 2016-10-07T20:52:44.557

    FWIW - If you're familiar with VBA in excel, it can be quicker to process large amounts of data, especially if you turn off screen updating. – Chris – 2016-11-15T13:31:03.573

    Answers

    0

    This is just to settle the questions I ask and which were not answered in the comments

    2) Yes, less columns with more rows is more efficient.

    3) The new laptop helped. It still takes some time to open and calculate the worksheet, but it's much faster than before (a fourth or less of the wait time).

    SC for reinstatement of Monica

    Posted 2016-10-07T15:21:50.110

    Reputation: 193

    1

    Excel is extremely memory intensive, and inefficient when working with large arrays. Using SSD or ram drive does not help because most will be resident in memory and not the hard disk. Keep track of the memory usage in Task Manager, and try to keep the ram usage in excel to under 2gb. Break up the data into smaller files, and use a collating or summary worksheet to call the data as needed instead of opening up everything one time. Typically a 55mb file may take up 1gb memory in excel, and take 2-3min to open depending on the formula structure, so 5mins to open 166mb is not unexpected.

    Save to binary files, and that will improve loading and saving the large files. Use the fastest CPU and memory possible. To test if your excel file is scalable to CPU cores, run a calculation with a timer in VBA using just 1 CPU core, and then rerun using 2 cores. If the speedup is close to 100%, then more cores will also help. Use a desktop instead of a laptop.

    Philip

    Posted 2016-10-07T15:21:50.110

    Reputation: 11