Backup Alerts¶

On this page

Backup Agent Down
Backups Broken
Cluster Snapshot Failed
Bind Failure
Snapshot Behind Snitch

If a problem with the MMS Backup system occurs, MMS sends an alert to system administrators. This page describes possible alerts and provides steps to resolve them.

Backup Agent Down¶

This alert is triggered if a Backup Agent for a group with at least one active replica set or cluster is down for more than 1 hour.

To resolve this alert:

Open the group in MMS by typing the group’s name in the GROUP box.
Select the Backup tab and then the Backup Agents page to see what server the Backup Agent is hosted on.
Check the Backup Agent log file on that server.

Backups Broken¶

If MMS On Prem Backup detects an inconsistency, the Backup state for the replica set is marked as “broken.”

To debug the inconsistency:

Check the corresponding Backup Agent log. If you see a “Failed Common Points” test, one of the following may have happened.
- A significant rollback event occurred on the backed-up replica set.
- The oplog for the backed-up replica set was resized or deleted.
If either is the case, you must resync the replica set. See Resync a Member of a Replica Set.
Check the corresponding job log for an error message explaining the problem. In MMS, click Admin, then Backup, and then Jobs. Then click the name of the job and then Logs. Contact MongoDB Support if you need help interpreting the error message.

Cluster Snapshot Failed¶

This alert is generated if MMS Backup cannot successfully take a snapshot for a sharded cluster backup. The alert text should contain the reason for the problem. Common problems include the following:

There was no reachable mongos. To resolve this issue, ensure that there is at least one mongos showing on the MMS Deployment page.
The balancer could not be stopped. To resolve this issue, check the log files for the first config server to determine why the balancer will not stop.
Could not insert a token in one or more shards. To resolve this issue, ensure connectivity between the Backup Agent and all shards.

Bind Failure¶

This alert is generated if a new replica set cannot be bound to a Backup Daemon. The alert test should contain a reason for the problem. Common problems include:

No primary is found. At the time the binding occurred, no primary could be detected by the Monitoring Agent. Ensure that the replica set is healthy.
Not enough space is available on any Backup Daemon.

In both cases, resolve the issue and then re-initiate the initial sync. Alternatively, the job can be manually bound through the MMS Admin interface. In MMS, click Admin, then Backup, and then Job Timeline.

For information on initial sync, see Replica Set Data Synchronization.

Snapshot Behind Snitch¶

This alert is triggered if the latest snapshot for a replica set is significantly behind schedule. Check the job log in the MMS Admin interface for any obvious errors. In MMS, click Admin, then Backup, and then Jobs. Then click the name of the job and then Logs.

← Administer the System Security →