Navigation

Prepare for Cluster Maintenance

Ops Manager performs a rolling restart when you perform maintenance on nodes in a cluster. Automation updates nodes in a cluster one-by-one until all nodes are updated to maintain cluster availability during a maintenance period.

Before you perform maintenance on your clusters, review the following considerations and take action, if necessary, to maintain cluster availability.

Note

To learn about how Automation performs maintenance on your clusters, see How does Ops Manager perform maintenance on cluster nodes?.

oplog Size

Each node in a cluster is restarted in standalone mode before maintenance starts. The node replays writes in the oplog to catch up to the other nodes when it is added back to the cluster after maintenance completes.

Make sure that the cluster’s oplog is large enough to store all writes that you application might make during the maintenance period. Use the replication.oplogSizeMB advanced deployment option to adjust the oplog size.

Priority

All client connections to a primary node are dropped when maintenance starts on that node. Connections are re-established to the newly elected primary node.

You may prefer a node in a specific data center to become the new primary node. Edit the cluster’s configuration and adjust the priority of each node to indicate your preferred primary node.

Fault Tolerance

Nodes undergoing maintenance don’t provide failover support to the cluster. For three-member replica sets, if an additional node becomes unavailable while one node is undergoing maintenance, the cluster has lost the majority of nodes. The primary node loses this status and steps down to become a secondary node. A new primary can’t be elected until a majority of the cluster’s nodes are available.

For mission-critical applications with high uptime needs, consider converting a three-member replica set to a five-member replica set before performing maintenance to maintain cluster majority in case an additional cluster node becomes unavailable during a maintenance period.

Note

Five-member replica sets or larger are more resilient and less likely to experience loss of majority during maintenance periods.

A simpler but less resilient option to increase multiple failure tolerance is to add a temporary arbiter to a three-member replica set before you perform maintenance.

Unique Index Builds

Automation builds indexes on cluster nodes one at a time using identical but independent commands. To ensure that writes respect the unique quality of indexed fields in a unique indexe, all writes to the collection on the cluster must stop before you build the index.

You can’t use Data Explorer or the Automation Config Resource in Ops Manager to create unique indexes in a rolling fashion because these methods don’t stop writes to the cluster.

If your use case requires you to build new unique indexes:

  1. Stop all writes to the affected collection. For more information. see db.fsyncLock() in the MongoDB Manual.
  2. See Build Indexes on Replica Sets in the MongoDB Manual to build the unique index in a rolling fashion.

See also

Create an Index