How best to manage my growing data in Excel?

2

This isn't a question about formulas or features in Excel. I'm debating the correct/best way to manage the growing amount of data 'I have to' manage in Excel (I produce PIVOT tables/reports for my management).

DATA: I record the number of publications we order: cost, date ordered, start and end of subscription, who requested it, when they ordered it, when I ordered it, will it be cancelled next year, etc, etc, etc.

DILEMMA: Obviously we re-order a lot of the same publications, so depending on how I manage the data I could be duplicating all over the place.

OPTION 1: So, do I use ROWs = publication name in Row 1 and all the related columns for each financial year are copied and pasted after each financial year ready for the new FY information? This will lead to me going to column ZZ.

OPTION 2: Or, do I use COLUMNs = each row has only one FY information for each publication and if we re-order or cancel a publication I re-type the publication name in a row below and fill in appropriate columns? This will lead to a long list of publications down to row 10000, and potential for misspelling of repeat ordered publication names.

IDEAS: What's the best way - thinking in terms of pivot table best practice, being able to sum or count easy, report formatting, etc.

Any best practices much appreciated.

RocketGoal

Posted 2010-03-12T09:55:25.433

Reputation: 1 468

1The first idea that comes to mind: Have you ever looked into databases, would suit this problem perfectly, especially when you normalise them well. Not sure if that is an option for you though ? – S.Hoekstra – 2010-03-12T10:15:10.953

Answers

1

If you have to stay in Excel, you should organize your data in a way that will make it easier to move to a relational database later. Even if you can't see moving it now, you will have the time and resources and management support to do it right someday, so make it easier on yourself.

It will be hard to have normalized data in Excel, but get as close as you can. Make another sheet with the publication names and use Data Validation in the "publication name" field to help prevent misspellings.

If I were doing it in RDBS, I would have three tables: Orders, Publications, People.

People: PersonID, PersonName ...

Publications: PublicationID, PubName, ...

Orders: OrderID, OrderDate, PublicationID, RequestedBy(personid), OrderedBy(personid), etc.

That may not be exactly how you will want it, but you can figure out what will work for your data. Once you know how it will look in an RDBS, set it up in Excel. For instance, each row would be an order and the columns would be

OrderDate, PublicationName, RequestedBy, OrderedBy, etc..

use Data Validation pointing to a separate table for PublicationName and for RequestedBy and OrderedBy. Then three years from now, you can create ID fields for your Publications and People tables. Then use formulas to replace the names in your orders table with the IDs you come up with. Import the whole mess into and RDBS and you're good to go.

Meanwhile, you're quasi-normalized data will be in great form to make pivot tables, IMO.

dkusleika

Posted 2010-03-12T09:55:25.433

Reputation: 1 776

That's a great way to think about it. Data validation for my publications and names is certainly something I will put in place. I've decided to go columns , so if I understand correctly this works with your way of thinking : a 'Publication' can be attached to many 'Orders' or many 'People'. Also columns reduces blank cells (publication ordered in 2011 that wasn't ordered in 2010 would still have a bunch of columns for 2010 that would never be used, but needed so my FY's stay in-line. Anyway. I like your thinking. – RocketGoal – 2010-03-15T10:39:58.300

3

Actually, Excel is not the right and efficient way to manage large amounts of data. In my company, which is a magazine distribution planning firm, they were used to do similar tasks as yours in Excel, but in time, Excel became inadequate, and IT has decided to move all Excel-related processes/works to an online database. The whole process (leaving Excel and moving to an online database) was handled as a project. Now everything is controlled via web interfaces and Oracle database. People manage the data after logging into their online accounts, and they can generate automatical web based reports. PL/SQL is used to generate special reports as well. Shortly, you should somehow start leaving Excel and move to a database solution. This kind of changes always seems hard, but once you manage them programmaticly with a successful coordination of IT, you pass to a higher and more productive level.


EDIT: IMO, it's no matter to what direction (down or right) you fill in data as long as you don't exceed Excel's maximum limits. Excel 2007's maximum limit for worksheet size is 1,048,576 rows and 16,384 columns. So you should just make sure that you organize the formulae well and input the correct data in correct cells. Since maximum column limit is 16,384, it's better to go downwards in your case.

Mehper C. Palavuzlar

Posted 2010-03-12T09:55:25.433

Reputation: 51 093

Couldn't agree with your more - but, I'm not in a position to ask the management to spend money on such a system. In fact, the fact that I'm involved means that they are struggling to manage it, and want me to make it all work for them, but I can only work within their budget limitations (and their expectations). They want it now and they want to be able to feel as though they could do it. Excel is certainly better than Word, or a notebook... – RocketGoal – 2010-03-12T11:22:17.003

3@Mike - There is always plain old Microsoft Access, it's not what I would call a good database but it's miles better than Excel only. – Nifle – 2010-03-12T11:56:17.670

2@Mike - Regarding cost, there are several excellent free-to-use commercial & open source databases available. As for the cost to develop, well, that's what you're for :-). Seriously, you can do this on a shoestring budget, and all the effort that you're putting into learning the wrong way to do it in Excel, you could do the right way in a real database. As for the "we can do it attitude" of mangement... #1 a database will save them money, #2 Excel can still be used to generate reports from the database... You should at least look in to the database option. – Joe Internet – 2010-03-12T12:16:08.480

I need to get the figures to them in the next week - so time to learn/shine is limited. I'm currently reading C# and MVC books with the aim to build on-line applications (simple on-line applications) to manage such things as publications, etc. But, and it's a big one - I'm 'restricted' to what I've been given for the next two weeks. So my question still stands- limited columns and many rows; or loads of columns and limited rows? This data will be the one I import into my new database in the future, so upstream planning (told you I'm reading) is beneficial. – RocketGoal – 2010-03-12T13:11:29.623

@Mike: Please see my edit. – Mehper C. Palavuzlar – 2010-03-12T14:14:20.083

Well, if that's the case, my recommendation id that you flip a coin. If you want a more definitive answer, go 2 out of 3 flips. Either way, you're going to end up with a mess, and it's not going to import well into a database because you can't enforce any relationships in Excel. As for C#/MVC/web apps, that's good, but what you need for this problem (when you're hands are no longer tied) is to study som SQL & relational database theory. You don't need a lot to get started. Ok, giving this some more thought, I would use multiple rows, with Col1=Publication, and create a column for each... – Joe Internet – 2010-03-12T14:22:01.303

...item that you want to track. You'll end up with a lot of duplication, but my idea here is to model your spreadsheet as though it were a sequence of SQL insert statements. From there, you can build the SQL from the spreadsheet and it should work ok. – Joe Internet – 2010-03-12T14:26:12.543

Also, since this is such a specialized case, you might get more mileage out of an Excel forum. I've gotten good answers from the MrExcel forums: http://www.mrexcel.com/forum/index.php

– Joe Internet – 2010-03-12T14:27:52.667

All useful comments and opinions. I'm going to plump for dkusleika's post below. But, I'm pleased my post wasn't closed, because it was classed as subjective, as it's given me views on data management that I wasn't considering. Thanks. – RocketGoal – 2010-03-15T10:44:45.683