- Back Up and Restore Deployments >
- Back up MongoDB Deployments >
- Backup Preparations
Backup Preparations¶
On this page
Overview¶
Before backing up your cluster or replica set, decide how to back up the data and what data to back up. This page describes items you must consider before starting a backup.
Important
Only sharded clusters or replica sets can be backed up. To back up a standalone mongod process, you must first convert it to a single-member replica set.
For an overview of how Backup works, see Backup.
Backup Configuration Options¶
The backup and recovery requirements of a given system vary to meet the cost, performance and data protection standards the system’s owner sets.
Ops Manager Enterprise Backup and Recovery supports five backup architectures, each with its own strengths and trade-offs. Consider which architecture meets the data protection requirements for your deployment before configuring and deploying your backup architecture.
Example
Consider a system whose requirements include low operational costs. The system’s owners may have strict limits on what they can spend on storage for their backup and recovery configuration. They may accept a longer recovery time as a result.
Conversely, consider a system whose requirements include a low Recovery Time Objective. The system’s owners tolerate greater storage costs if it results in a backup and recovery configuration that fulfills the recovery requirements.
Ops Manager Enterprise Backup and recovery supports the following backup architectures:
- A File System on a Sophisticated SAN
- A File System on one or more NAS devices
- An AWS S3 Blockstore
- MongoDB Blockstore in a Highly Available configuration
- MongoDB Blockstore in a Standalone configuration
Important
The backup architecture features and concerns are provided as guidance for developing your own data protection requirements. They do not cover every scenario nor are they representative of every deployment.
Backup Method Features¶
Backup System Feature | File System on SAN | File System on NAS | AWS S3 Blockstore | MongoDB HA Blockstore | MongoDB Blockstore |
---|---|---|---|---|---|
Snapshot Types | Complete | Complete | Many partial | Many partial | Many partial |
Backup Data Deduplication | If SAN supports | No | Yes | Yes | Yes |
Backup Data Compression | Yes | Depends | Yes | Yes | Yes |
Backup Data Replication | If SAN supports | No | No | Yes | No |
Backup Storage Cost | Higher | Medium | Lower | Higher | Lower |
Staff Time to Manage Backups | Medium | Medium | Lower | Higher | Medium |
Backup RTO | Lower | Medium | Lower | Lower | Medium |
When Do You Use a Particular Backup Method?
- If you do not want to maintain separate backup systems nor do you want your staff to maintain them, consider backing up to a MongoDB or S3 blockstore.
- If you need to restore data without relying on MongoDB database, consider backing up to a file system on a SAN or NAS device or an S3 blockstore.
- If you are backing up large amounts of data or frequently need to restore data, consider either a file system on a SAN, S3 blockstore or a MongoDB blockstore configured as a replica set or sharded cluster.
- If you want to minimize internal storage and maintenance costs, consider backing up to a MongoDB standalone blockstore or an S3 blockstore.
- If you have a SAN with advanced features like high availability, compression, de-duplication, etc., consider using that SAN for file system backups.
Snapshot Frequency and Retention Policy¶
By default, Ops Manager takes a base snapshot of your data every 24 hours.
If desired, administrators can change the frequency of base snapshots to 6, 8, 12, or 24 hours. Ops Manager creates snapshots automatically on a schedule. However, you cannot take snapshots on demand.
Ops Manager retains snapshots for the time periods listed in the following table. If you terminate a backup, Ops Manager immediately deletes the backup’s snapshots.
Snapshot | Default Retention Policy | Maximum Retention Policy |
---|---|---|
Base snapshot | 2 days | 5 days |
Daily snapshot | 0 days | 1 year |
Weekly snapshot | 2 weeks | 1 year |
Monthly snapshot | 1 month | 3 years |
You can change a backed-up deployment’s schedule through its Edit Snapshot Schedule menu option, available through the Backup page. Administrators can change snapshot frequency and retention through the snapshotSchedule resource in the API.
Changing the reference time changes the time of the next scheduled snapshot:
- If the new reference time is before the current reference time, the next snapshot occurs at the new reference time tomorrow. See the first two rows of the table below for examples.
- If the new reference time is after the current reference time, and you make the change before the current reference time, the next snapshot occurs at the new reference time today. See the third row of the table below for an example.
- If the new reference time is after the current reference time, but you make the change after the current reference time, the next snapshot occurs at the new reference time tomorrow. See the fourth row of the table below for an example.
Time of Change | Current Reference Time | New Reference Time | Time of Next Snapshot |
---|---|---|---|
08:00 UTC | 12:00 UTC | 10:00 UTC | 10:00 UTC tomorrow |
13:00 UTC | 12:00 UTC | 10:00 UTC | 10:00 UTC tomorrow |
08:00 UTC | 12:00 UTC | 14:00 UTC | 14:00 UTC today |
13:00 UTC | 12:00 UTC | 14:00 UTC | 14:00 UTC tomorrow |
If you change the schedule to save fewer snapshots, Ops Manager does not delete existing snapshots to conform to the new schedule. To delete unneeded snapshots, see Delete a Snapshot.
Namespaces Filter¶
The namespaces filter lets you specify which databases and collections to back up. You create either a Blacklist of those to exclude or a Whitelist of those to include. You make your selections when starting a backup and can later edit them as needed. If you change the filter in a way that adds data to your backup, a resync is required.
Use the blacklist to prevent backup of collections that contain logging data, caches, or other ephemeral data. Excluding these kinds of databases and collections will allow you to reduce backup time and costs. Using a blacklist is often preferable to using a whitelist as a whitelist requires you to intentionally opt in to every namespace you want backed up.
Important
If you use the namespaces filter, your backup’s restore data will not include
the seedSecondary
script. The seedSecondary script provides
an alternative to initial sync on new or restored replica set members. For
more information, see Seed a New Secondary from Snapshot.
Storage Engine¶
When you enable backups for a sharded cluster or a replica set that runs on MongoDB 3.0 or later, you can choose the storage engine for the backups.
You can choose a different storage engine for a backup than you do for the original data. That is, the backup storage engine does not need to match that of the original data. For example, if your original data uses MMAPv1, you can choose WiredTiger for backing up.
Insert Only MongoDB Workloads
WiredTiger may be preferred when backing up insert only MongoDB workloads.
You can change the storage engine for a cluster or replica set’s backups at any time, but doing so requires an initial sync of the backup on the new storage engine.
Encryption¶
Changed in version 3.4: Starting in 3.4, Ops Manager supports encryption for any backup job that was stored in a head database running MongoDB Enterprise 3.4 or later with the WiredTiger storage engine.
For details on setting up backup encryption, see Encrypted Backup Snapshots.
WiredTiger Options¶
If you choose the WiredTiger engine to back up a collection that
already uses WiredTiger, the initial sync replicates all the
collection’s WiredTiger options. For information on these options,
see the storage.wiredTiger.collectionConfig
section of the
Configuration File Options
page in the MongoDB manual.
For collections created after initial sync, the Backup Daemon uses its own defaults for storing data. The Daemon will not replicate any WiredTiger options for a collection created after initial sync.
Important
The storage engine chosen for a backup is independent from the storage engine used by the Backup Database. If the Backup Database uses the MMAPv1 storage engine, it can store backup snapshots for WiredTiger backup jobs in its blockstore.
Index collection options are never replicated.
For more information on storage engines, see Storage in the MongoDB manual.
Resyncing Production Deployments¶
For production deployments, it is recommended that as a best practice you periodically (annually) resync all backed-up replica sets. When you resync, data is read from a secondary in each replica set. During resync, no new snapshots are generated.
You may also want to resync your backup after:
- A reduction in data size, such that the size on disk of Ops Manager’s copy
of the data is also reduced. This scenario also includes if you:
- Have a TTL index in place, which periodically deletes documents.
- Drop a collection (MMAPv1 only).
- Run a sharded cluster, and there have been a lot of chunks moved off a particular shard.
- A switch in storage engines, if you want Ops Manager to provide snapshots in the new storage engine format.
- A manual build of an index on a replica set in a rolling fashion (as per Build Indexes on Replica Sets in the MongoDB manual).
Checkpoints¶
For sharded clusters, checkpoints provide additional restore points between snapshots. With checkpoints enabled, Ops Manager creates restoration points at configurable intervals of every 15, 30 or 60 minutes between snapshots. To enable checkpoints, see enable checkpoints.
To create a checkpoint, Ops Manager stops the balancer and inserts a token into the oplog of each shard and config server in the cluster. These checkpoint tokens are lightweight and do not have a consequential impact on performance or disk use.
Backup does not require checkpoints, and they are disabled by default.
Restoring from a checkpoint requires Ops Manager to apply the oplog of each shard and config server to the last snapshot captured before the checkpoint. Restoration from a checkpoint takes longer than restoration from a snapshot.
Snapshots when Agent Cannot Stop Balancer¶
For sharded clusters, Ops Manager disables the balancer before taking a cluster snapshot. In certain situations, such as a long migration or no running mongos, Ops Manager tries to disable the balancer but cannot. In such cases, Ops Manager will continue to take cluster snapshots but will flag the snapshots with a warning that data may be incomplete and/or inconsistent. Cluster snapshots taken during an active balancing operation run the risk of data loss or orphaned data.
Snapshots when Agent Cannot Contact a mongod
¶
For sharded clusters, if the Backup Agent cannot reach a mongod process, whether a shard or config server, then the agent cannot insert a synchronization oplog token. If this happens, Ops Manager will not create the snapshot and will display a warning message.