4
1
I'm trying to find a way to aggregate data in a hierarchical data set, preferably within a pivot table but other methods might be OK as well. Consider a data set (greatly simplified for the example) that looks like the one below. From this data, I'm trying to build a set of functions that will answer questions like:
"How much total inventory do I have for Fruit?"
"How many different kinds of Food do I sell?"
Item Category
======= ========
Apples Fruit
Bacon Meat
Chicken Meat
Corn Veg
Food
Fruit Food
Grapes Fruit
Meat Food
Squash Veg
Steak Meat
Veg Food
Each Item has (among lots of other information) a Category, which we can really think of as a "parent". But also note that within the data set, all the "parents" also have their own parent categories. In this data set one sample 'branch' of the hierarchy would be Food->Meat->Chicken.
To answer the question like "How many different kinds of Fruit do I sell" it's not hard, because this is first level category. I can just use the COUNTIF function and say "How many Items belong to Category "Fruit"?" -- and I get a table that looks like this:
Item Category COUNTIF(categories,me)
Apples Fruit 0
Bacon Meat 0
Chicken Meat 0
Corn Veg 0
Food Food 3
Fruit Food 2
Grapes Fruit 0
Meat Food 3
Squash Veg 0
Steak Meat 0
Veg Food 2
Easy - for the first row you just see how many times "Apples" appears as someone else's Category. (Since it's zero, I know that Apples are not a parent... this should help, but I'm not sure how...) Now row five, "Fruit", appears as someone else's Category two times - since the number is NOT zero, I know it's a Category instead of just an Item. All well and good for the first level math, but...
This leads me to the part that I haven't been able to solve... How do I figure out how many TOTAL kinds of "Food" I have? And given that my actual data has many more levels of hierarchy, I need to walk up and down the tree to figure out how many total chidren are in each one. The first level COUNTIF function tells me that there are three subcategories of Food (Fruit, Veg, & Meat) -- but what I really want is to somehow have it recursively determine that Fruit, Veg, and Meat might also be Categories, and sum up the corresponding numbers for those children. In excel terms, what I really want is to be able to build another column that recursively/iteratively counts the TOTAL number of items in that whole subtree... in this case, there are seven unique Items that belong to Food: 3 meats, 2 veg, and 2 fruit.
Some complicating factors:
There's no explicit identifier in the data to tell us whether that particular item is also a category, or if it's a bottom-level item.
Each item only knows what it's category/parent is - there is no explicit data to tell whether it has children or not. Said another way: all Items belong to a Category, but only some Items are also Categories.
In the actual data the parent relationship can get as much as 10 levels deep, BUT there are no guarantees that the depth of each branch in the hierarchy is consistent: some items might be 3 levels deep while the next one might be 8.
The root or ultimate parent doesn't come with a category, but this is a one-off case that I can easily handle manually.
I'm fully aware that this would be a trivial exercise in any 'real' programming language (Perl, Python, etc)... but ultimately I have to hand this off to someone who does not have programming experience, so I'm trying very very hard to make this fit into a "standard" Excel workbook.
Have you looked at (do you have access to)
Power Pivot
? I do not have it in 365/2016 but it was a free add-in included in some versions of Excel and has the capability of creating hierarchies. I don't know if it will fill your requirements, but might be worth a shot. – Ron Rosenfeld – 2016-09-03T21:00:27.6071It doesn't seem to me that Excel is the right tool for this job.. – Raystafarian – 2016-09-05T19:55:19.277
RonRosenfeld- I do have this, but it doesn't appear to do what I need it to... @Raystafarian: I absolutely agree (as I mentioned, this is trivial to code up as a recursive walk in anything else) and I've already coded it in Python, but ultimately I'm not maintaining this and the overwhelming majority of the OTHER things I need this app/workbook/whatever to do are ideal in Excel. This was the "one last thing" I was trying to get working natively within Excel... – ljwobker – 2016-09-05T20:53:54.007
1Is MS access a possibility? I mean, with the structure of your data this would even be a difficult task with VBA in excel. Is there a way to alter the data structure? Or to know what categories there always are, like a pretty static report? – Raystafarian – 2016-09-06T09:35:53.120
@Raystafarian There are a fair number of "other" ways to solve it - but I'm sort of in this strange corner where EITHER I can get it solved 100% in "standard, portable, non-VBA" excel... or I have to have SOME kind of other external application or helper... And if I have to go that way it's just going to be Python (since I've already written it). Sadly in this specific case the only real incremental value is if I can do it natively inside a 'normal' excel workbook... – ljwobker – 2016-09-06T20:19:12.663
Hm. If the list is basically static then it might not be too difficult with several helper columns. If it's not static then there will need to be more helper columns indexing each column and determining what appears in both to determine the hierarchy. I see it as possible. – Raystafarian – 2016-09-07T10:07:35.080
Will a top branch always have a blank cell in the second column? Can something like "Fruit" appear in more than 1 branch - "food" and "car part"? – Raystafarian – 2016-09-07T10:17:35.823