we experience a strange behaviour in our MongoDB Replica-Set, setup of 3 Nodes (all Xeon Quad-Core-Class CPUs, 16GB of RAM for one, 24GB for the other two nodes) The one node with less RAM is normal secondary with priority 0, other two priority 1. Recently we experienced a Replication-Lag of about 60 seconds every 3 to 4 hours, self disappearing after 2-3minutes (Nagios Checks!)
We have almost no traffic on those machines, just some databases with a size of 0,3GB and one is 5GB. And we have one collection which has about 65000 entries but also an id index.
The Strange thing is, that the 16gb-secondary has no lag, but only the secondary from the two larger machines. i just changed it to be primary to see if the old primary (now secondary) also has this behaviour.
Does anyone know what we can do or check? Because we have no clue.
I checked the Load and processes of those machines, the network connectivity and routing, disk states - everyhtings fine.