4

There is one particular Excel file that is used by a number of employees at my company. It is edited from both Excel 2003 and 2007, with the "Sharing" feature turned on to allow multiple writers at once.

The file has a decent amount of data on several sheets with some basic formatting, and used to be about 6MB, which seems reasonable for its content. But after a few weeks of editing, the file grew to 10, then 20 MB, and eventually skyrocketed to more than 150 MB, even though it still has about the same amount of data as before. It now takes 5-10 minutes to open it, and that much time again to save it.

The first time this happened, I copied the content of each sheet into a new, blank workbook, and saved the new workbook; this brought it back down to about 6MB. Now, it has blown up again.

The workbook uses the "Data Validation" feature to limit the values in certain columns to the contents of a few named ranges. Copying all the data into a new workbook means re-setting up all the data validation, which is a pain and not something that we want to do every month.

As a troubleshooting step, I tried saving the file in "XML Spreadsheet 2003" format, hoping to get some insight into what was being stored. Sure enough, the file was almost a gig, and almost all of the 10 million lines look like this:

<NamedCell ss:Name="Z_21D5114F_E50C_46AC_AA4F_C3FF540C717F_.wvu.FilterData"/>
<NamedCell ss:Name="Z_1EE2BA5E_3011_4F9A_8ACD_E58835250FC4_.wvu.FilterData"/>
<NamedCell ss:Name="Z_1E3BDCEA_6A72_4ECC_BF4F_7B03CC66181E_.wvu.FilterData"/>

I've seen a few VBScripts online to manage and enumerate named cells that are hidden in Excel's built-in interface, though I wonder how they'd handle my 10 million named cells. What I really need, though, is an understanding of why this keeps happening. What actions in excel could be causing this?

UPDATE:

Here's an experiment I tried that provides some more detail:

  • I turned off sharing; the file remained huge.
  • I saved the file as an .xlsx file, and it shrank to 5MB.
  • Then I closed that file, opened it back up, and saved it as an .xls file, with sharing still turned off; it got huge again!
  • When an '03 user tries to open that nice, compact .xlsx file, it takes several minutes to open it, even though '07 opens it fine.

So, this seems to be an '03 specific issue, and saving the file in '03 format immediately recreated a bunch of junk that clearly had not been in the '07 file at all.

Josh
  • 743
  • 4
  • 9
  • 17

5 Answers5

12

Say it with me: Excel is not a database.

You're running up against the design limitations of the software: it's only got so much ability to store transactional data, so when multiple people are writing to it, it has to store a frickton of information in order to reconcile. You have so much data in it, that the transactional copies are HUGE.

Microsoft assumes (correctly) that if you have that much data, it's stored in a database, and you're just using Excel as a front end.

If you're going to work like that, you should at least knock together a little Access database. It will save you a world of headache, because it's meant to WORK like that and Excel just isn't.

@Josh: Yes, it absolutely IS leaking. When you share a document, it has to keep track of the modifications done by every user...I'll call this "transactional data" but you can just think of it as history. Since there is never an "official" version, it keeps keeping track of changes, and the document bloats up faster than Kirstie Alley in a doughnut shop.

It's by design. Someone who is an excel guru may be able to tell you how to make it stop, but the best solution is just not to use excel for data that's being constantly maintained. It's not really what it's designed for.

I am sympathetic to your problem, but it's a better solution to explain the problem to the higher-ups and work out a new procedure, than it is to try and prolong an unfortunate hack.

@Josh: If you just want to shrink it once, temporarily, copy all the data, and paste it into a new spreadsheet. THAT will kill all the metadata, guaranteed (make sure you don't select the whole sheet, but only the part with data in it) But this is a temporary solution at best.

Satanicpuppy
  • 5,917
  • 1
  • 16
  • 18
  • Couldn't agree more! – Izzy Apr 20 '10 at 13:58
  • I disagree. First, turning sharing off doesn't reduce the file size, even though there's no "transactional data" left. Second, there are 6 worksheets; 4 are small, and the other 2 are 1500x27 and 500x30, which should be well within Excel's meager abilities. This looks like a bug or a design flaw here -- something is leaking. – Josh Apr 20 '10 at 14:21
  • 1
    Also... respectfully, I'm not the CEO, the COO, or even the CTO, I'm a sysadmin. Not all of us on this site get to run our companies the way we would like. Usually I'm expected to perform real miracles with virtual duct tape. Telling me "your company should do things differently" does NOT help me get my users up and running today. – Josh Apr 20 '10 at 14:30
  • 2
    He's actually 100% correct, both from a technical and a process perspective.. Pick up an MS Access and VBA primer, and make a database for the team. Hell, you'll then have a new skill to utilize, and something new to put on the resume. Notice how the other answer (from Matt) also suggests a database. If your next flippant answer is going to be "pfffft we can't afford to give everyone MS Access" then pick up a free copy of mySQL, and learn some basic web development. It's really not as hard as you might think to manipulate a database from a webpage. – Izzy Apr 20 '10 at 14:37
  • When I turn off sharing, Excel warns me that all tracked changes will be deleted and that any other users who have the file open will not be able to update it, even if I turn sharing back on. In other words, all history and sync data is cleared out permanently. For this reason (and others; see the update to my main question in about 5 min.) I am certain that the problem is totally unrelated to sharing. – Josh Apr 20 '10 at 14:51
  • @Izzy, I've already spent dozens of hours of my free time building a (awesome) prototype web interface to replace this spreadsheet with a MySQL+TurboGears system that solves this problem and dozens of others. The CTO loves it; the trick is convincing the CEO to grow the IT department so that I have a few hours a day to focus on development and save the company some serious cash in the long run. That might happen one day if I'm very lucky. Not today. Tomorrow's not looking so good either. – Josh Apr 20 '10 at 15:10
2

Looks like you're running into a Custom Views problem. See:

How do I remove the _#wvu#FilterData stuff ?
http://www.tech-archive.net/Archive/Excel/microsoft.public.excel/2006-12/msg00704.html

How do I disable Custom Views? (scroll)
http://www.eggheadcafe.com/software/aspnet/30197438/how-do-i-disable-custom-v.aspx

ane
  • 171
  • 4
  • 16
  • I tried deleting all current custom views, but it didn't help. I also tried running a vbscript to delete all cell names that end with FilterData, but it doesn't help. They seem to be recreated each time the document is saved, even after round-tripping through the '07 file format which doesn't have the problem. – Josh Apr 20 '10 at 15:05
  • Try renaming the XLSX file to ZIP. Then open it and find what else could be taking up so much space? – ane Apr 20 '10 at 15:14
1

For a quick temporary fix, can you turn off sharing and save the file to see if it dumps the extra baggage? For a better solution you need to turn this into a database. If you do not have the time and expertise to create the database have you looked at Google's spreadsheet? They just rewrote it for better multiuser simultaneous editing.

Matthew
  • 174
  • 5
  • Turning off sharing didn't help. And unfortunately, Google apps are NOT an option at this company. (Although, I've tried using google for spreadsheets with more than a couple of hundred rows, and it can be painful even for a single user.) – Josh Apr 20 '10 at 14:44
0

You may also want to look at doing a "Save As..." from the File menu.

Excel, along with Word and other Office members, saves undo history in the file for a LONG time.

So, do a "Save As..", pick a new file name, then compare the two files for size. If the new one has shrunk as I expect it will, move it back over the old one, and go on your merry way.

Other commenters are correct, though, when they point out that Excel is not a database. Yes, it can work like one, and is good at it for small data sets. But for what you're describing, you need to migrate to a new solution.

Additional information:
Any calculations, all formatting, and any other kinds of "make it do this _____" add to the storage overhead. A 1 MB sheet will quickly grow to 100MB if a number of these kinds of things are associated with the workbook.

warren
  • 17,829
  • 23
  • 82
  • 134
  • Usually I just copy/paste it into a new document. That shrinks it right up...For a while. – Satanicpuppy Apr 20 '10 at 14:56
  • Tried that... see my update for details. Like I said, this is only about 2000-odd rows spread across several worksheets. It doesn't seem like it should be that big a deal. – Josh Apr 20 '10 at 14:58
0

http://answers.microsoft.com/en-us/office/forum/office_2007-excel/why-do-my-excel-files-become-huge/8235dc08-aa70-4a65-9892-b776790d79c5

Try this

  1. Identify the last filled up row of data and click on any cell in one row below that
  2. Select from that row till the end and delete all rows
  3. Do the same for columns
  4. Save and close the file
  5. Shut down MS Excel
  6. Reopen the file

Does this solve the problem?