We have a mongodb instance running on an amazon ec2 large (7.5GB) ubuntu instance (same machine that our node.js server is running from). Traffic has increased a LOT recently and we are starting to see some erratic behavior from mongodb. The current state:
We noticed some slow queries using the profiler:
query mydb.user 1327ms Wed Aug 01 2012 14:01:39
query:{ "_id" : ObjectId("500f45486562e7053d070363") } idhack responseLength:178 client:127.0.0.1 user:
Entries in the user table are small but there are about 50 million entries in the table. This happens every minute or two and a series of slow queries follow it. When we execute the slow queries from the command line using explain()
, nothing bad is reported.
mongostat
tells me:
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
138 804 9 0 96 36 0 60.2g 121g 3.42g 2 1.8 0 0|0 1|0 93k 479k 19 fgset M 14:15:59
94 755 4 0 71 35 0 60.2g 121g 3.41g 0 1.5 0 0|0 1|0 78k 344k 19 fgset M 14:16:00
93 17 4 0 75 27 0 60.2g 121g 3.41g 0 1.2 0 0|0 1|0 24k 31k 19 fgset M 14:16:01
87 86 6 0 73 33 0 60.2g 121g 3.41g 0 0.9 0 0|0 1|0 31k 260k 19 fgset M 14:16:02
101 531 3 0 62 19 0 60.2g 121g 3.41g 0 1 0 0|0 1|0 60k 1m 19 fgset M 14:16:03
92 713 2 0 66 24 0 60.2g 121g 3.41g 1 0.9 0 0|0 0|0 72k 1m 17 fgset M 14:16:04
163 91 6 0 93 46 0 60.2g 121g 3.41g 2 9.5 0 0|0 1|0 44k 256k 17 fgset M 14:16:05
108 62 6 0 79 38 0 60.2g 121g 3.41g 4 1.2 0 0|0 1|0 32k 122k 17 fgset M 14:16:06
137 23 6 0 81 32 0 60.2g 121g 3.41g 0 2.3 0 0|0 0|0 32k 67k 17 fgset M 14:16:07
pidstat -r -p <pid> 5
tells me:
02:18:01 PM 1700 647.00 0.80 126778144 3578036 46.80 mongod
02:18:06 PM 1700 1092.00 1.20 126778144 3586364 46.91 mongod
02:18:11 PM 1700 689.60 0.20 126778144 3578912 46.81 mongod
02:18:16 PM 1700 740.80 1.20 126778144 3577652 46.79 mongod
02:18:21 PM 1700 618.60 0.20 126778144 3578100 46.80 mongod
02:18:26 PM 1700 246.00 1.00 126778144 3577392 46.79 mongod
Note that our database volume is a single ext4 volume and NOT a raided set as recommended.
I am not sure what the next step is to understand the problem enough to implement a fix. Any input is appreciated.