Calculate/determine in MS Excel which filetype/product has been ordered first (complex table)

1

I'm using MS Excel 2013 on a Windows desktop computer.

The original worksheet I'm working on contains about 200,000 rows, or 4,000,000 cells, and is about people who downloaded a .pdf file, ordered a paperback edition and/or purchased a specific product afterwards.

I would like to answer the following questions for myself:

  1. Which has been ordered first, the .pdf or the paperback file, before making the final product order? In total, by year, by month etc.
  2. How many percent of the customers ordered the PDF file first, and how many percent of the customers ordered the leaflet first?

Additional information:

  • Some people ordered only a PDF-file
  • Other people ordered only a paperback file
  • And others ordered the PDF as well the paperback file.

Those orders are separately registered, because the orders are made at different times and/or dates.

I have already tried many filtering options in Excel, but nothing worked yet. I already filtered the rows on name and time, even in PivotTables, but I can't get the answer:

About ... % of the time a PDF (or a Paperback filetype) is ordered first.

You can have a look at an example of the example worksheetfile here (this file I have made up, so all the content in the cells is making no sense, really :-)

Is it possible to filter this per time period? Like by year/month/etc? And is this possible in a PivotTable, or should I look for another solution?

I just can't get the outcome I want (Which has been ordered first? The .pdf or the paperback file, before making the final product order?).

Can someone please help?

Oh... another (second) question...: How to find out how many duplicate names there are in this worksheet? (so: how many customers did get one or more product? This isn't the same as the total amount of ordered products, because some of them ordered two or more items. In the example worksheet it's easy to count, but in a file of 200,000 rows it isn't that easy).

jLiz

Posted 2015-09-21T20:13:13.140

Reputation: 11

Why do you consider a duplicate name as one person, maybe it's 2 people with same name? – User15 – 2015-09-21T21:29:27.063

You are right. Names are not unique anymore. In the original file I filter on email addresses. The names in the example worksheet you can replace by emailadresses, if you like. The basic is the same. – jLiz – 2015-09-21T22:24:27.097

You need to import this into a database even if it is MS Access. Then you can do SQL queries to answer those questions more easily. – cybernard – 2015-09-21T22:24:46.163

Answers

0

I would use the Power Query Add-In for this. It has lots of data transformation capabilities, but probably the key steps for your challenge would be:

  1. Create a "Base" Query based on your table and Add a Index column (for later use).
  2. Create a "Customer" Query referring to the first Query and use Group By - group on Customer, show Min Index column.
  3. Create a Query referring to the "Base" query, then Merge with the "Customer" query using the Customer and Index (= Min Index) - this can show you what each customer's first order was.

I hope that gets you started.

Mike Honey

Posted 2015-09-21T20:13:13.140

Reputation: 2 119