Dealing with Master Node Failure in MongoDB Cluster with Replica Set

Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server. When MongoDB cluster is set up with replica set, one of the secondaries will take the role of the primary when Master is down. In case of a failure, the switch should be processed automatically. One of the remaining secondaries calls for an election to select a new primary and automatically resume normal operations. In this way the cluster will remain operating normally as all the write operations are received by the master node.

How to troubleshoot the failure

The median time before a cluster elects a new primary should not typically exceed 12 seconds (default replica configuration settings). This includes time required to mark the primary as unavailable and call and complete an election. You can tune this time period by modifying the settings.electionTimeoutMillis replication configuration option. Factors such as network latency may extend the time required for replica set elections to complete. This in turn affects the amount of time your cluster may operate without a primary.

MongoDB Cluster replica set members send heartbeats to each other every two seconds. If a heartbeat does not return within 10 seconds, the other members mark the delinquent member as inaccessible.

To check the cluster status, run the following command from any member of the replica set :

rs.status()

It shows the status as that particular member sees it.

The following command gives you some basic info about the whole set. It also shows whether the current member is master or not and who the other members are.

rs.isMaster()

If you have a 911 situation which requires immediate attention contact us and start a conversation with our experts! Whatever the reason we are here to assist!]