How to visualize/graph a massive dataset?

3

I have a CSV file of data collected using Performance Monitor. It collected CPU usage at 5 second intervals for two weeks.

I need to be able to graph or visual the data in a way that'll help me determine how often the CPU is maxed out versus idling.

Excel has a limit of 32,000 data points in a graph, and I have a lot more than that. So I either need a different way to graph the data, or a way to filter the data in Excel, such as averaging over 10-minute blocks of time.

Chris Thompson

Posted 2011-04-29T21:03:22.097

Reputation: 4 765

Answers

0

If you are just looking to filter out the times when the data is maxing out a quick fix solution might be to just import the data 32,000 items at a time into multiple Excel sheets and filter each one of those out to find the time when the CPU maxes out.

For me, I would normally do this sort of thing with Python, Numpy and Matplotlib, but obviously if you don't have a preferred solution, it might be a bit much to learn something new to just plot the data once.

Ian Turner

Posted 2011-04-29T21:03:22.097

Reputation: 628

This is what I ended up doing. I split the data for each day into a separate sheet and just made a separate graph per day. It was a bit time consuming, but it worked. – Chris Thompson – 2011-05-06T05:33:12.800

2

I would say you have to look into something like R for this purpose, not simply Microsoft Excel. I haven't used it myself yet but it was used a lot by students at my university, it should be able to handle such amounts.

Carighan Maconar

Posted 2011-04-29T21:03:22.097

Reputation: 676

There is also an R add-in for Excel - RExcel: http://en.wikipedia.org/wiki/RExcel

– Joe Internet – 2011-04-30T00:48:19.817

I tried this, but being new to R I wasn't able to figure out how to get it to plot the datapoints. – Chris Thompson – 2011-05-06T05:32:15.563

0

You could sample over (say) a minute, bringing the data point from 241920 to 4032. To do this you need to add a column to your data, and create a function that mark just the start of the minute. Then the sample could be taken with a simple VLOOKUP(). If you want help with the sample function post details over the data format (expecially the date/time column).

Of course instead of sampling you could average or take max and/or min

momobo

Posted 2011-04-29T21:03:22.097

Reputation: 166