I have a 3-node Replica Set (primary + 2 secondaries). They are all running Mongo 4.0.18 using the MMAPv1 engine. I am trying to switch the replica set over to use WiredTiger.
I read through the MongoDB tutorial on how to Change Replica Set to WiredTiger. That tutorial instructs how to change each node in situ: take it offline, reconfigure it, bring it back online. I am not following those instructions as-is, but instead want to introduce new nodes to the replica set and (when all seems well) decommission the older nodes from the set.
I launched a new AWS EC2 instance with Mongo configured for WiredTiger and manually added it to the replica set, following the Add Members to a Replica Set tutorial. (At essence, rs.add({ host: ip_of_new_instance + ":27017", priority: 0, votes: 0 })
)
The new node switches state from OTHER
to STARTUP2
, populates its dbPath
folder with many new collection-*
and index-*
files, and eventually switches state to SECONDARY
. All looks well. I can see all of the collections/documents via the mongo
shell when running $ mongo db_name
from the new node, and I can still access the primary by running $ mongo 'mongodb://username:password@mongodb.private:27017/db_name?authSource=admin&replicaSet=rs0'
.
HOWEVER, the moment the new node transitions from STARTUP2 to SECONDARY, my application starts to fail, reporting the Mongo error:
Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1591711351, 1) } with id: 6817586637606748161
I have not been able to reproduce this Mongo error outside of the application (Rocket.Chat, built on the Meteor framework). Perhaps the problem lies there. Or perhaps the application is doing something I haven’t tried from the mongo shell, e.g. tailing the oplog. [Update: I tried it but am not sure if I’m doing it right: db.oplog.rs.find().tailable({ awaitData: true })
returns a dozen documents before prompting for it
]
If, however, I start the new-node process from scratch, changing just one thing –– set the storage.engine to mmapv1 instead of wiredTiger –– then all works well. My application functions properly. I don’t know why the application works when all nodes are running mmapv1 but fails when there is a wiredTiger node, especially since the engine is a node-internal thing, opaque to the client.
I notice a strange discrepancy between running mmapv1 and wiredTiger. The node running wiredTiger includes two keys (operationTime
and $clusterTime
) in the response to certain commands (e.g. db.adminCommand({ getParameter: '*' })
). None of the mmapv1 nodes (new or old) include those keys in their responses. Since the Mongo error message in my application’s logs includes a reference to time, I’m very suspicious that the presence of $clusterTime
only on the wiredTiger node is somehow related to the underlying problem.
I’m not sure how to troubleshoot this. I’ve been googling for solutions, but I have not found any strong leads –– only a few references to that error message, none of which seem entirely on target:
- https://stackoverflow.com/questions/60876115/error-while-converting-a-mongodb-cluster-into-a-replica-set
- https://developer.mongodb.com/community/forums/t/error-while-converting-a-cluster-into-a-replica-set/2022 (duplicate of above)
- https://jira.mongodb.org/browse/SERVER-32845 "Arbiter fails when receiving an isMaster command with a $clusterTime"
- https://jira.mongodb.org/browse/SERVER-33947 "Arbiter replies "No keys found for HMAC that is valid for time" to isMaster with clusterTime"
- https://jira.mongodb.org/browse/SERVER-32639 "Arbiters in standalone replica sets can't sign or validate clusterTime with auth on once FCV checks are removed" (The previous two are considered dups of this, although this one does not contain that error message)