Any ideas for graphing data with 6 million rows?

5

1

I just tried to import it into excel, but it can only accept a little over a million rows of data. It's 3 columns of data, and all I want to do is graph column 1 against 2 and 3 for two graphs.

I am thinking of making a grid preprocessor which divides the 2D landscape up into cells and marks each cell as having or not having an element within it. There will be a fiddle factor for making the cells small enough to discern information from the graph while large enough to be under 1Million filled cells to fit in excel.

While I do on that or something else, anyone know how to graph all of the data easily?

SwimBikeRun

Posted 2013-05-28T22:19:39.670

Reputation: 581

1Why not use a database like SQL or shudder Microsoft Access? – James Mertz – 2013-05-28T23:25:53.727

Lack of knowledge of their existence. I'll try it out. – SwimBikeRun – 2013-05-28T23:27:51.453

@SwimBikeRun what format are the data in right now? – nhinkle – 2013-05-28T23:40:51.610

tab separated floating point values – SwimBikeRun – 2013-05-28T23:44:58.190

1In Excel you can handle more than 1M rows of data in PivotTable and PivotCharts. Go through the import data dialogs - but instead of storing in the Excel sheet in the last step, store it as PivotChart. From here, you can create your chart... Haven't tried with 6M rows, but can imagine that this should work! Good luck! – Peter Albert – 2013-05-29T10:29:24.523

2You should consider methods for aggregating your data. I don't think you'll be able to visually discern 6M rows (or even many less than that). Consider the resolution at which you'll be viewing the data (screen or paper) and that medium's resolution and consider that you can't discern more than one data point to unit of resolution (e.g. dpi or pixel). For example at 1200 dpi, you'd need 5000 inches/417 feet to display 6M datapoints. – dav – 2013-05-29T12:11:36.253

Answers

2

Save it as comma separated file and load it into R with the command

data <- read.csv('mybigfatfile.csv', header=T)

(here I assume the first row is the headers; if there are no headers, set header to F). If the column names are A, B, and C, then you can plot as

plot(data$A, data$B, col=rgb(100,80,0,10, max=255), pch=16)

Here the color will be rgb(100,80,0) with white being rgb(255,255,255) and opacity of 10 (out of 255). Per momobo's answer, you can take a random sample instead if 6 million takes too long to display:

idx <- sample.int(length(data$A), 10000)
plot(data$A[idx], data$B[idx], col=rgb(100,80,0,10, max=255), pch=16)

Here we select 10000 random integers (from 1 to length(data$A)).

To get help with R commands, type ? followed by command, eg,

?plot

However, R has a steep learning curve. But I guess this is one way.

Peon

Posted 2013-05-28T22:19:39.670

Reputation: 771

1I did this and I am currently waiting on the plot command. It took about 30 seconds to import the data, but I have been waiting two minutes and it still hasn't plotted anything. I did a quick plot of the head of the data, and the plot command is right. How long about should 6 million rows take to plot? More to the point, how do I speed this up? Is there a thinning function for R? – SwimBikeRun – 2013-05-28T23:43:12.120

@SwimBikeRun, yes momobo has the right idea: take a random sample. I updated my answer. – Peon – 2013-05-29T18:12:02.040

It's amazing how it is simple to sample in R! – momobo – 2013-05-29T21:33:02.897

2

You could also try to sample the data. Take only one in ten (or one in hundert) row und try to plot the result. If your sampling is truly random you should have graph that are pretty much representative of the "population"

momobo

Posted 2013-05-28T22:19:39.670

Reputation: 166

+1 A visualization with 6 million data points is almost certainly no more helpful than one with a (few) thousand. The huge number of points may even obscure relationships in the data or overwhelm the viewer (or for that matter, the visualization application). Sampling is the way to go. – Excellll – 2015-03-11T18:25:15.763

0

I faced the same problem, Finally I used MSChart with c# and loaded the data by code and draw it to the chart.

I think this video would help https://www.youtube.com/watch?v=82jnryBxsnI

You can also zoom the chart.

Shady Sherif

Posted 2013-05-28T22:19:39.670

Reputation: 101

You might as well post the code snippets now. A complete answer is always more likely to be helpful to someone. – Excellll – 2015-03-11T18:15:51.483