10

I've inherited a large MongoDB server at work, and have been tasked with figuring out why queries run against it are so slow. The DB contains tons and tons of records (on the order of 10^9) and takes up about 300 GiB. At first, I thought that the raw number of records may have been the culprit, so I set up indices on appropriate fields. It helped greatly for queries that were searching for criteria that was in the DB, but would take anywhere from 80-90 minutes for queries where there would be no matches. Any ideas on how to tackle this?

Rob Olmos
  • 2,220
  • 1
  • 15
  • 25
Mark LeMoine
  • 201
  • 1
  • 2
  • 7

4 Answers4

5

It seems that you are missing an important index, especially since the query with no matches is taking that much longer than a query with matches. What type of query is it? Maybe you need a compound index? Are you using where as part of the query?

I'm not an expert in MongoDB, but 80-90 minutes is absolutely abnormal. I don't know what MongoDB "sounds like" to Arenstar, but it has been shown to handle billions of documents (terabytes of data).

4

This might be too late for an answer, but I thought it was worth mentioning. There are tools like mongostat that can be used to see what actually is going on. Too many index misses, too many page faults. If you are running simultaneous read and write operations, then you should also look for lock percentage. 300 GB is not a huge amount for mongodb to handle. It can easily go much beyond that.

Other helpful commands:

    db.stats();
    db.currentOp();
awsiv
  • 166
  • 11
2

Look at the MongoDB Database Profiler Page, particularly the section on Optimizing Query Performance.

gWaldo
  • 11,887
  • 8
  • 41
  • 68
-4

MongoDb is webscale.. Sorry had to say that..

No matches probably take a while based on perhaps a range scan???

I would break it down.. Mongo doesnt sounds like a DB to handle 300gb, and probably isnt mature enough as of yet to know how it acts with large amounts of data..

For what reason are you using Mongo??

Arenstar
  • 3,592
  • 2
  • 24
  • 34
  • It was just what the guy before me (who left) was comfortable using and had implemented it in, unfortunately. On the bright side, we're open to new DB suggestions. – Mark LeMoine Nov 10 '10 at 21:37
  • What data are you storing??? its all about your datastructure – Arenstar Nov 10 '10 at 21:41
  • It's small logfile records, each of which contains an ID (int), server name (string), message payload (string), date, and IP address. I currently have indices on ID and server name. – Mark LeMoine Nov 10 '10 at 21:54
  • Your logging messages (log host-syslog style) i would use MySQL with partitioning based on time.. Its a clear winner for what your doing.. (im guessing MyISAM engine) Most index's wont be loaded on old partitions :D And MySQL is mature enough to handle big DBS.. Your other option is hadoop, but requires a few servers and map-reduce jobs ( so quite a big change ) Thats definitely my 2 cents :D – Arenstar Nov 10 '10 at 21:58
  • 8
    there are a ton of companies out there using mongodb. 300g is doable, ask the folks at foursquare, wordnik and boxedice who have a ton of data. if anything, the indexes are incorrect. I would check those before you rip out the DB and replace it with something else. – sdot257 Nov 18 '10 at 17:20
  • 1
    @Arenstar: What does webscale means? And how is it any different to any other scale? – Joshua Partogi Nov 23 '10 at 01:18
  • I'm a newbie in mongodb.. but I think you can use capped collection to make your log more efficient.. – seatoskyhk Jan 17 '11 at 20:52