0

We experienced some strange thing in our mongodb gridfs platform. The platform actually is a bi Xeon E5 (bi quad core) with 128GB of memory, running on freebsd 9 with a zfs pool dedicated for mongodb.

[root@mongofile1 ~]# uname -sr
FreeBSD 9.1-RELEASE

our /boot/loader.conf

vfs.zfs.arc_min="2048M"
vfs.zfs.arc_max="7680M"
vm.kmem_size_max="16G"
vm.kmem_size="12G"
vfs.zfs.prefetch_disable="1"
kern.ipc.nmbclusters="32768"

/etc/sysctl.conf

net.inet.tcp.msl=15000
net.inet.tcp.keepidle=300000
kern.ipc.nmbclusters=32768
kern.ipc.maxsockbuf=2097152
kern.ipc.somaxconn=8192
kern.maxfiles=65536
kern.maxfilesperproc=32768
net.inet.tcp.delayed_ack=0
net.inet.tcp.sendspace=65535
net.inet.udp.recvspace=65535
net.inet.udp.maxdgram=57344
net.local.stream.recvspace=65535
net.local.stream.sendspace=65535

we follow the recommendation for the ulimit :

[root@mongofile1 ~]# su - mongodb
$ ulimit -a
cpu time               (seconds, -t)  unlimited
file size           (512-blocks, -f)  unlimited
data seg size           (kbytes, -d)  33554432
stack size              (kbytes, -s)  524288
core file size      (512-blocks, -c)  unlimited
max memory size         (kbytes, -m)  unlimited
locked memory           (kbytes, -l)  unlimited
max user processes              (-u)  5547
open files                      (-n)  32768
virtual mem size        (kbytes, -v)  unlimited
swap limit              (kbytes, -w)  unlimited
sbsize                   (bytes, -b)  unlimited
pseudo-terminals                (-p)  unlimited

This server have a twin (same config exactly) for ReplSet in other data center and we have a virtualized arbiter.

Some time, almost 3 days, the process of mongodb exit. The problem begin with:

Fri Nov  8 11:27:31.741 [conn774697] end connection 192.168.10.162:47963 (23 connections now open)
Fri Nov  8 11:27:31.770 [initandlisten] can't create new thread, closing connection
Fri Nov  8 11:27:31.771 [rsHealthPoll] replSet member mongofile2:27017 is now in state DOWN
Fri Nov  8 11:27:31.774 [initandlisten] connection accepted from 192.168.10.162:47968 #774702 (20 connections now open)
Fri Nov  8 11:27:31.774 [initandlisten] connection accepted from 192.168.10.161:28522 #774703 (21 connections now open)
Fri Nov  8 11:27:31.774 [initandlisten] connection accepted from 192.168.10.164:15406 #774704 (22 connections now open)
Fri Nov  8 11:27:31.774 [initandlisten] connection accepted from 192.168.10.163:25750 #774705 (23 connections now open)
Fri Nov  8 11:27:31.810 [initandlisten] connection accepted from 192.168.10.182:20779 #774706 (24 connections now open)
Fri Nov  8 11:27:31.855 [initandlisten] connection accepted from 192.168.10.161:28524 #774707 (25 connections now open)
Fri Nov  8 11:27:31.869 [initandlisten] connection accepted from 192.168.10.182:20786 #774708 (26 connections now open)

and after many "can create new thread"

[root@mongofile1 /usr/mongodb]# tail -n 15000 mongod.log.old |grep "create new thread"|wc
5020   55220  421680

and finish by a magnificent

Fri Nov  8 11:30:22.333 [rsMgr] replSet warning caught unexpected exception in electSelf()
pure virtual method called
Fri Nov  8 11:30:22.333 Got signal: 6 (Abort trap: 6).
Fri Nov  8 11:30:22.337 Backtrace:
0x599efc 0x8035cb516
 0x599efc <_ZN5mongo10abruptQuitEi+988> at /usr/local/bin/mongod
 0x8035cb516 <_pthread_sigmask+918> at /lib/libthr.so.3

Extract of mongodb from top

78126 mongodb      77  20    0  1253G  1449M sbwait  0   0:20  0.00% mongod

If I restart the process when it crash, the problem is fixed for almost 3 days.

Has anyone seen this before, or know of a fix?

Chris S
  • 77,337
  • 11
  • 120
  • 212

1 Answers1

2

The Mongo DB Project recommends setting

kern.threads.max_threads_per_proc=32000
kern.maxfilesperproc=64000

in /etc/sysctl.conf. You can run /etc/rc.d/sysctl restart to have the new setting take effect immediately, or reboot (whatever floats your boat)

Chris S
  • 77,337
  • 11
  • 120
  • 212
  • I try your recommandation. The precedent value is 9000 but i think the number of threads is Very huge by regard of the number of simultannous connections. but in mongodb.conf, the MaxConns= 2048. But i don't understand why mongodb need to have these type of big settings, our average connections is less than 30/sec ... – Jean-Dominique BAYLAC Nov 08 '13 at 15:57
  • 1
    I wish I could help more, but I don't run Mongo myself. What you're saying make sense to me, but they recommend crazy settings, so until you have issues while running those setting there's nothing else to do. – Chris S Nov 08 '13 at 16:03
  • in all of case, i wish you many thank for your help ;-) have a nice Weekend – Jean-Dominique BAYLAC Nov 08 '13 at 16:07
  • It seem the ports use the embedded version of boost-lib, i install and customize mongodb ports for use the lib boost from system 1.52 in place of 1.49 I will see ;) – Jean-Dominique BAYLAC Nov 08 '13 at 23:49
  • problem persist ... – Jean-Dominique BAYLAC Nov 17 '13 at 18:30