1

What sort of computer specs (CPU, RAM, disk speed) should I use for running queries on a database of 200+ million records? The queries are for a research project, so there is only one "user" and only one query will be running at a time.

I tried it on my own laptop with SQL Server with an i3 processor, 2GB RAM, 5400 RPM disk and a simple query didn't finish even after 8+ hours. I have an option to connect a SSD via eSata and upgrade to 4GB RAM, but not sure if this will be enough...

Thanks!

Edit: The database is about 25 GB and the indexes are not setup properly. When I tried to add an index, I let it run for about 8 hours and it still hadn't finished so I gave up. Should I have more patience :)?
In general, the queries will run once in a while and its ok even if it takes a couple hours to complete....

Also, the queries will produce probably about 10 million records which I need to process using Stata/Matlab and I'm concerned that my current laptop is not strong enough, but unsure of the bottleneck....

Sam Halicke
  • 6,122
  • 1
  • 24
  • 35
SpeksETC
  • 113
  • 1
  • 3

2 Answers2

1

You didn't complain about how long it took to fill the DB with 200 million records, so I conclude your lapttop is fast enough.

Databases work because they store the data records with certain "key" fields indexed to provide fast access. (Otherwise every query would have to read every record every time, and 25Gb takes a long time to read).

If your query is supposed to produce a modest number of results (say, 10K records), I'd guess your problem is that your records aren't indexed by the principal element of the query.

Ira Baxter
  • 113
  • 5
  • It's possible the database file was given to the OP pre-popuated. –  Nov 25 '10 at 16:06
  • The database was provided to me and was filled on a different machine. You are correct that the indexes are not setup properly, but adding an index didn't complete even after waiting 8 hours... –  Nov 25 '10 at 16:14
  • So, set up the index and go home for the night. Expect it to take a long time; it has to scan all the records sequentially because... well, ... theres no index :-} With the indexes in place, queries should be a lot faster. – Ira Baxter Nov 25 '10 at 16:20
  • Thats what I did, but only slept 8 hours. Maybe I should do it for the weekend. I assumed it should be a clustered index? –  Nov 25 '10 at 16:25
  • Depends on what you need, but most likely not. A clustered index re-organizes all the rows in the table such that the rows are in the same order as the index. As such, there can be only 1 per table, and will take a long time to build. Non-clustered indexes only order the index data in a separate space, and then use pointers to the rows for the rest of the data. Rule of thumb: A clustered index should be created for sorting results (ORDER BY), non-clustered indexes for filter conditions (WHERE clauses) of the data. – Evan M. Nov 25 '10 at 22:04
1

Speccing hardware for a huge database is something that's incredibly difficult to get right. We've got several large databases (9M+ records in some tables) and one massive database (300M+ records in most tables), and even for us it's been a bit of a hit and miss affair.

But just as important as hardware specs is your database schema, and as Ira mentioned, Indexes are king in this scenario.

It will take much longer than 8 hours to create the index, and the table has to be completely unlocked, as creating an index locks the entire table. An index taking a long time to create is often a sign of a locked table, so make sure that nothing else is using the database when you try to do it.

What will make things harder for you in this scenario is the speed of your disk. A single 5400 RPM hard drive (I've seen laptops with even slower spindle speeds, 5400 is pretty good for a laptop) will take a LOT longer to load the data into cache for indexing than a 15,000 RPM drive. As a comparison, our SQL server has 14x 15,000 RPM disks (for a grand total of 210,000 RPM, or a whopping 38x faster spindle speed, and I would guess maybe 50x faster access speed in real life) and I would expect creating an index on 200M records would still take over an hour.

So, get that SSD if you can. It will help dramatically. One SSD's performance gain over a 5400 RPM drive is almost too great to measure.

If you can, get that 4Gb of RAM. Not only will it help your SQL server speed (not as much as SSD, but it will still help), but from my (brief) experience with Matlab, it will help there too.

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255