My secondary DB server went down, so I'm booting up a replacement secondary and trying to perform the initial sync. I've been following the tutorials and advice out there to use RAIS 10 on Amazon EBS
So I used 4x4GB EBS in a RAID 10 with the following setup (that was suggested by mongodb back then)
sudo lvcreate -l 90%vg -n data vg0
sudo lvcreate -l 5%vg -n log vg0
sudo lvcreate -l 5%vg -n journal vg0
Since my Primary's version starts getting old (v3.2), I'm at the same time trying to upgrade to 3.4 so I just booted a secondary on 3.4 (in case this might be relevant to the problem)
Problem is, during the initial sync, MongoDB populates too many journal files in /journal, a total of 4x100MB journal files are allocated
ec2-user@secondary$ ll /journal/
total 369105
drwx------ 2 root root 12288 Apr 3 14:47 lost+found
-rw-r--r-- 1 mongod mongod 104644096 Apr 3 19:00 WiredTigerLog.0000000001
-rw-r--r-- 1 mongod mongod 104685568 Apr 3 19:00 WiredTigerLog.0000000002
-rw-r--r-- 1 mongod mongod 104857600 Apr 3 19:00 WiredTigerLog.0000000003
-rw-r--r-- 1 mongod mongod 104857600 Apr 3 19:00 WiredTigerLog.0000000004
-rw-r--r-- 1 mongod mongod 0 Apr 3 19:00 WiredTigerTmplog.0000000005
which exceed the disk capacity allocated for journaling and causes a brutal crash during initial sync
2018-04-03T19:00:18.821+0000 E STORAGE [thread2] WiredTiger error (28) [1522782018:821142][6176:0x7efc0cd3d700], log-server: /data/journal/WiredTigerTmplog.0000000005: handle-write: pwrite: failed to write 128 bytes at offset 0: No space left on device
2018-04-03T19:00:18.821+0000 E STORAGE [thread2] WiredTiger error (28) [1522782018:821213][6176:0x7efc0cd3d700], log-server: journal/WiredTigerTmplog.0000000005: fatal log failure: No space left on device
2018-04-03T19:00:18.821+0000 E STORAGE [thread2] WiredTiger error (-31804) [1522782018:821228][6176:0x7efc0cd3d700], log-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
2018-04-03T19:00:18.821+0000 I - [InitialSyncInserters-my_job_glasses_production.ahoy_events0] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64
2018-04-03T19:00:18.821+0000 I - [InitialSyncInserters-my_job_glasses_production.ahoy_events0]
***aborting after fassert() failure
2018-04-03T19:00:18.821+0000 I - [thread2] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 365
2018-04-03T19:00:18.821+0000 I - [thread2]
***aborting after fassert() failure
I'm not really sure WHY this happens, since on my primary, I only have 2 journal files of 100MB each so I was guessing everything should have been okay
ec2-user@primary$ ll /data/journal/ -h
total 205M
-rw-r--r-- 1 mongod mongod 4.1M Apr 3 18:49 WiredTigerLog.0000000059
-rw-r--r-- 1 mongod mongod 100M Apr 3 16:43 WiredTigerPreplog.0000000001
-rw-r--r-- 1 mongod mongod 100M Apr 3 16:43 WiredTigerPreplog.0000000002
Did I miss something or is something wrong ? Here is my mongod.conf
systemLog:
destination: file
logAppend: true
path: /log/mongod.log
logRotate: reopen
storage:
dbPath: /data
journal:
enabled: true
processManagement:
fork: true # fork and run in background
pidFilePath: /var/run/mongodb/mongod.pid
net:
port: 27017
#bindIp added accordingly
security:
authorization: enabled
keyFile: /xxx.key
replication:
replSetName: XXX
EDIT: It would seem during the initial sync, MongoDB creates up to a dozen files each 100MB before going back to 4x100MB files. Where is this documented ?? is there a way to put a limit on this ??