I'm looking for some advice on what to do with a primary replica set that gets disconnected from the network (e.g. data center network outage) to the extent that we promote one of the secondaries to primary to restore service for the application using the database.
Automatic failover is one of the inherent features of MongoDB's replica set design, so you should not have to manually failover to a secondary unless you have intentionally changed your configuration from the default. Both primary
and secondary
are member states (or roles) within a replica set, and intended to be distinct from master/slave topology which typically requires manual intervention for failover.
If the current primary is not reachable by a majority of the configured voting members of a MongoDB replica set, the expected outcome is that:
- the isolated primary will step down and become a secondary
- a new primary may be elected if a majority of the voting members still have connectivity with each other and an eligible member to elect.
See Replica Set Elections in the MongoDB manual for more information.
Before network re-connection should we be killing the mongodb running on the box to allow to be added back in as a secondary? Or does a primary disconnected from its set change mode?
An isolated data-bearing member of the replica set will remain in secondary state but show as "not healthy/reachable" if you check rs.status()
on other members of the replica set. It is generally a good idea to provision all of your electable data-bearing members identically so that any member can take on the role of primary if needed (as opposed to having a specially provisioned primary member).
If you want your replica set to recover automatically, you should leave all members running as-is and they will resume syncing (if possible) once connectivity is restored. In the default configuration, an isolated member which was formerly a primary will resume syncing as a secondary. If you have a strong preference for which member gets elected primary (for example, based on data centre location) you can adjust the priority for replica set members. If a preferred primary was isolated, it will rejoin the replica set as a secondary and resume syncing until it has sufficiently caught up to be eligible to become primary and trigger an election.
The caveat on resuming syncing is that isolated members must still have sufficient overlap with the replication oplog of a healthy replica set member in order to catch up on any write activity that occurred while the member was isolated. A secondary whose oplog no longer has any overlap with any other members of the replica set will be flagged as "stale" and will need to be resynced.
What would happen if we allowed the mongodb server back on the network with mongodb running as primary although isolated until re-connection?
It isn't possible to have an isolated primary unless you forcibly reconfigure your replica set so there are no other voting members. You cannot have two primaries in a replica set. If an isolated former primary accepted any writes that weren't propagated to a majority of the replica set members, these writes will be rolled back (exported to disk for administrative intervention) when the former primary resumes connectivity with the other members of the replica set. You can take additional steps to avoid rollbacks, including use of majority
write concern.
If you are new to MongoDB replica sets, I would recommend using the default configuration and provision to enable automatic failover and recovery. Manual intervention should only be required for exceptional circumstances.