Navigation
This version of the documentation is archived and no longer supported. It will be removed on 01 March 2021. To learn how to upgrade your version of MongoDB Ops Manager, refer to the upgrade documentation.
You were redirected from a different version of the documentation. Click here to go back.

Prepare for Cluster Maintenance

Ops Manager performs a rolling restart when you perform maintenance on nodes in a cluster. The Automation Agent updates nodes in a cluster one-by-one, always maintaining a primary node, until all nodes are updated to maintain cluster availability during a maintenance period.

For each secondary node in the cluster, the Automation Agent:

  1. Restarts the mongod process running on the node in standalone mode.
  2. Performs the maintenance task.
  3. Restarts the mongod process running on the node in replSet mode.

After the secondary nodes are updated, the Automation Agent:

  1. Steps the primary down using the rs.stepDown() command.
  2. Triggers an election for a new primary node.
  3. Performs the maintenance task on the former primary node.
  4. Restarts the mongod process running on the former primary node in replSet mode to join the cluster as a secondary node.

In Ops Manager, the Automation Agent performs rolling restarts on cluster nodes for maintenance tasks, including the following:

  • Rotating KMIP keys.
  • Rotating keyfiles.
  • Changing mongod configuration arguments.
  • Upgrading or downgrading TLS, auth, or clusterAuth mode.
  • Changing the MongoDB version.
  • Changing the oplog size.
  • Changing the name of a replica set.
  • Removing a process from a replica set.
  • Cancelling a restore from backup.
  • Enabling the Profiler
  • Shutting down an agent if it is still bound to a server pool.

Before you perform maintenance on your clusters, review the following considerations and take action, if necessary, to maintain cluster availability.

oplog Size

Each node in a cluster is restarted in standalone mode before maintenance starts. The node replays writes in the oplog to catch up to the other nodes when it is added back to the cluster after maintenance completes.

Make sure that the cluster’s oplog is large enough to store all writes that you application might make during the maintenance period. Use the replication.oplogSizeMB advanced deployment option to adjust the oplog size.

Priority

All client connections to a primary node are dropped when maintenance starts on that node. Connections are re-established to the newly elected primary node.

You may prefer a node in a specific data center to become the new primary node. Edit the cluster’s configuration and adjust the priority of each node to indicate your preferred primary node.

Fault Tolerance

Nodes undergoing maintenance don’t provide failover support to the cluster. For three-member replica sets, if an additional node becomes unavailable while one node is undergoing maintenance, the cluster has lost the majority of nodes. The primary node loses this status and steps down to become a secondary node. A new primary can’t be elected until a majority of the cluster’s nodes are available.

For mission-critical applications with high uptime needs, consider converting a three-member replica set to a five-member replica set before performing maintenance to maintain cluster majority in case an additional cluster node becomes unavailable during a maintenance period.

Note

Five-member replica sets or larger are more resilient and less likely to experience loss of majority during maintenance periods.

A simpler but less resilient option to increase multiple failure tolerance is to add a temporary arbiter to a three-member replica set before you perform maintenance.

Unique Index Builds

The Automation Agent builds indexes on cluster nodes one at a time using identical but independent commands. To ensure that writes respect the unique quality of indexed fields in a unique indexe, all writes to the collection on the cluster must stop before you build the index.

You can’t use Data Explorer or the Automation Config Resource in Ops Manager to create unique indexes in a rolling fashion because these methods don’t stop writes to the cluster.

If your use case requires you to build new unique indexes:

  1. Stop all writes to the affected collection. For more information. see db.fsyncLock() in the MongoDB Manual.
  2. See Build Indexes on Replica Sets in the MongoDB Manual to build the unique index in a rolling fashion.

See also

Create an Index