Troubleshooting¶
On this page
This document provides advice for troubleshooting problems with Ops Manager.
For resolutions to alert conditions, see also Alert Resolutions.
Getting Started Checklist¶
To begin troubleshooting, complete these tasks to check for common, easily fixed problems:
- Authentication Errors
- Check Agent Output or Log
- Ensure Connectivity Between Agent and Monitored Hosts
- Ensure Connectivity Between Agent and Ops Manager Server
- Allow Agent to Discover Hosts and Collect Initial Data
Authentication Errors¶
If your MongoDB instances run with authentication enabled, ensure Ops Manager has the MongoDB credentials. See Configure MongoDB Authentication and Authorization.
Check Agent Output or Log¶
If you continue to encounter problems, check the agent’s output for errors. See Agent Logs for more information.
Ensure Connectivity Between Agent and Monitored Hosts¶
Ensure the system running the agent can resolve and connect to the mongod processes. If you install multiple Monitoring Agents, ensure that each Monitoring Agent can reach every mongod process in the deployment.
To confirm, log into the system where the agent is running and issue a command in the following form:
Replace [hostname]
with the hostname and [port]
with the
port that the database is listening on.
Ops Manager does not support port forwarding.
Ensure Connectivity Between Agent and Ops Manager Server¶
Verify that the Monitoring Agent can connect on TCP port 8443
(outbound) to the Ops Manager server (i.e. api-agents.mongodb.com
.)
Allow Agent to Discover Hosts and Collect Initial Data¶
Allow the agent to run for 5-10 minutes to allow host discovery and initial data collection.
Installation¶
The monitoring server does not start up successfully¶
Confirm the URI or IP address for the Ops Manager service is stored correctly
in the mongo.mongoUri
property in the
<install_dir>/conf/conf-mms.properties
file:
If you don’t set this property, Ops Manager will fail while trying to connect to the default 127.0.0.1:27017 URL.
If the URI or IP address of your service changes, you must update the property with the new address. For example, update the address if you deploy on a system without a static IP address, or if you deploy on EC2 without a fixed IP and then restart the EC2 instance.
If the URI or IP address changes, then each user who access the service
must also update the address in the URL used to connect and in the
client-side monitoring-agent.config
files.
If you use the Ops Manager <install_dir>/bin/credentialstool to
encrypt the password used in the mongo.mongoUri
value, also add the
mongo.encryptedCredentials
key to the
<install_dir>/conf/conf-mms.properties
file and set the value for
this property to true:
Monitoring¶
Alerts¶
For resolutions to alert conditions, see also Alert Resolutions.
For information on creating and managing alerts, see Manage Alert Configurations and Manage Alerts.
Cannot Turn Off Email Notifications¶
There are at least two ways to turn off alert notifications:
- Remove the deployment from your Ops Manager account. See Remove a Process from Management or Monitoring.
- Disable or delete the alert configuration. See Manage Alert Configurations.
- Turn off alerts for a specific host. See Disable Alerts for a Specific Process.
Receive Duplicate Alerts¶
If the notification email list contains multiple email-groups, one or more people may receive multiple notifications of the same alert.
Receive “Host has low open file limits” or “Too many open files” error messages¶
These error messages appear on the Deployment page, under a host’s name. They appear if the number of available connections does not meet the Ops Manager-defined minimum value. These errors are not generated by the mongos instance and, therefore, will not appear in mongos log files.
On a host by host basis, the Monitoring Agent compares the number of open file descriptors and connections to the maximum connections limit. The max open file descriptors ulimit parameter directly affects the number of available server connections. The agent calculates whether or not enough connections exist to meet the Ops Manager-defined minimum value.
In ping documents, for each node and its serverStatus.connections
values,
if the sum of the current
value plus the available
value is less than
the maxConns
configuration value set for a monitored host, the Monitoring
Agent will send a Host has low open file limits or Too
many open files message to Ops Manager.
Ping documents are data sent by Monitoring Agents to Ops Manager. To view ping documents:
Note
To access this feature, you must either:
- Belong to the project, or
- Have the Global Monitoring Admin role or the Global Owner role.
- Click the Deployment page.
- Click the host’s name.
- Click Last Ping.
To prevent this error, we recommend you set ulimit
open files to 64000. We
also recommend setting the maxConns
command in the mongo shell to at least
the recommended settings.
See the MongoDB ulimit reference page and the the MongoDB maxConns reference page for details.
Deployments¶
Deployment Hangs in In Progress
¶
If you have added or restarted a deployment and the deployment remains
in the In Progress
state for several minutes, click View
Agent Logs and look for any errors.
If you diagnose an error and need to correct the deployment configuration:
- Click Edit Configuration and then click Edit Configuration again.
- Reconfigure the deployment.
- When you complete your changes, click Review Changes and then Confirm & Deploy.
If you shut down the deployment and still cannot find a solution, remove the deployment from Ops Manager.
Monitoring Agent Fails to Collect Data¶
Possible causes for this state:
- If the Monitoring Agent can’t connect to the server because of networking restrictions or issues (i.e. firewalls, proxies, routing.)
- If your database is running with SSL. You must enable SSL either globally or on a per-host basis. See Configure Monitoring Agent for SSL and Enable SSL for a Deployment for more information.
- If your database is running with authentication. You must supply Ops Manager with the authentication credentials for the host. See Configure MongoDB Authentication and Authorization.
Hosts¶
Hosts are not Visible¶
Problems with the Monitoring Agent detecting hosts can be caused by a few factors.
Host not added: In Ops Manager, click Deployment, then click the Processes tab, then click the Add Host button. In the New Host window, specify the host type, internal hostname, and port. If appropriate, add the database username and password and whether or not Ops Manager should use SSL to connect with your Monitoring Agent. Note it is not necessary to restart your Monitoring Agent when adding (or removing) a host.
Accidental duplicate mongods If you add the host after a crash and restart the Monitoring Agent, you might not see the hostname on the Ops Manager Deployment page. Ops Manager detects the host as a duplicate and suppresses its data. To reset, click Settings, then Project Settings, then the Reset Duplicates button.
Monitoring Agents cannot detect hosts: If your hosts exist across multiple data centers, make sure that all of your hosts can be discovered by all of your Monitoring Agents.
Cannot Delete a Host¶
In rare cases, the mongod is brought down and the replica set is reconfigured. The down host cannot be deleted and returns an error message, “This host cannot be deleted because it is enabled for backup.” Contact MongoDB Support for help in deleting these hosts.
Projects¶
Additional Information on Projects¶
Create a project to monitor additional segregated systems or environments for servers, agents, users, and other resources. For example, your deployment might have two or more environments separated by firewalls. In this case, you would need two or more separate Ops Manager projects.
API and shared secret keys are unique to each project. Each project requires its own agent with the appropriate API and shared secret keys. Within each project, the agent needs to be able to connect to all hosts it monitors in the project.
For information on creating and managing projects, see Projects.
Munin¶
Important
As of Automation Agent 2.7.0, hardware monitoring using munin-
node
is deprecated.
munin-node
is a third-party package. For problems related to
installing munin-node
, see the
Munin Wiki.
Install and configure the munin-node
service on the MongoDB
server(s) to be monitored before starting Ops Manager monitoring. The Ops Manager
agent’s README
file provides guidelines to install munin-node
.
See also
See Configure Hardware Monitoring with munin-node for details
about monitoring hardware with munin-node
.
Red Hat Enterprise Linux (RHEL 6, 7) can generate the following error messages.
No package munin-node is available Error¶
To correct this error:
Follow the instructions on the Extra Packages for Enterprise Linux repository wiki page to install the
epel-release rpm
for your version of your enterprise Linux.After the package is installed, type this command to install
munin-node
and all of its dependencies:After the
munin-node
is installed, check to see if themunin-node
service is running. If it is not, type these commands to start themunin- node
service.
Non-localhost IP Addresses are Blocked¶
By default, munin blocks incoming connections from non-localhost IP addresses.
The /var/log/munin/munin-node.log
file will display a
“Denying connection” error for your non-localhost IP address.
To fix this error, open the munin-node.conf
configuration file and comment
out these two lines:
Then add this line to the munin-node.conf
configuration file with a pattern
that matches your subnet:
Restart munin-node
after editing the configuration file for changes to take effect.
Verifying iostat and Other Plugins/Services Returns “# Unknown service” Error¶
The first step is to confirm there is a problem. Open a telnet session and
connect to iostat
, iostat_ios
, and cpu
:
The iostat_ios
plugin creates the iotime
chart, and the cpu
plugin creates the cputime
chart.
If any of these telnet fetch
commands returns an “# Unknown Service” error,
create a link to the plugin or service in /etc/munin/plugins/ by typing these
commands:
Replace <service>
with the name of the service that generates the error.
Disk names are not listed by Munin¶
In some cases, Munin will omit disk names with a dash between the name and a
numerical prefix, for example, dm-0
or dm-1
. There is a documented fix for
Munin’s iostat plugin.
Authentication¶
Two-Factor Authentication¶
Missed SMS Authentication Tokens¶
Unfortunately SMS is not a 100% reliable delivery mechanism for messages, especially across international borders. The Google authentication option is 100% reliable. Unless you must use SMS for authentication, use the Google Authenticator application for two-factor authentication.
If you do not receive the SMS authentication tokens:
- Refer to the Manage Your Two-Factor Authentication Options page for more details about using two-factor authentication. This page includes any limitations which may affect SMS delivery times.
- Enter the SMS phone number with country code first followed by the area code and the phone number. Also try 011 first followed by the country code, then area code, and then the phone number.
If you do not receive the authentication token in a reasonable amount of time, contact your Ops Manager system administrator.
Delete or Reset Two-Factor Authentication¶
Contact your system administrator to remove or reset two-factor authentication on your account.
For administrative information on two-factor authentication, see Manage Two-Factor Authentication for Ops Manager.
LDAP¶
Forgot to Change MONGODB-CR Error¶
If your MongoDB deployment uses LDAP for authentication, and you find the following error message:
Then make sure that you specified the LDAP (PLAIN)
as is the
authentication mechanism for both the Monitoring Agent and the Backup
Agent. See Configure Backup Agent for LDAP Authentication and
Configure Monitoring Agent for LDAP.
Backup¶
Logs Display MongodVersionException¶
The MongodVersionException
can occur if the Backup Daemon’s host cannot access the internet to download the version
or versions of MongoDB required for the backed-up databases. Each
database requires a version of MongoDB that matches the database’s
version. Specifically, for each instance you must run the latest stable
release of that release series.
If the Daemon runs without access to the internet, you must manually download the required MongoDB versions, as described here:
Go to the MongoDB downloads page and download the appropriate versions for your environment.
Copy the download to the Daemon’s host.
Decompress the download into the directory specified in the
mongodb.release.directory
setting in the Daemon’sconf-daemon.properties
file. For the file’s location, see Ops Manager Configuration.Within the directory specified in the
mongodb.release.directory
setting, the folder structure for MongoDB should look like the following:
Insufficient Oplog Size Error¶
When using the Ops Manager interface to back up a MongoDB cluster, Ops Manager checks to see if the cluster’s oplogs are, based on their recent usage, large enough to hold a minimum of 3 hours worth of data based on the last 24 hours of usage patterns. If the oplogs are not large enough, it may be possible that the oplogs may have turned over multiple times, creating a gap between where a backup ends and an oplog starts.
If the oplog size check fails, the user cannot enable backups and is shown the warning:
If possible, wait to start a backup until the oplog has had sufficient time to meet the size requirement.
If this is not possible, the minimum oplog size value can be changed.
See mms.backup.minimumOplogWindowHours
for how to set this
value.
Warning
Do not change the minimum oplog size unless you are certain smaller backups still provide useful backups.
Important
MongoDB recommends only changing this value temporarily to permit a test backup job to execute. The minimum oplog size value should be reset to the default as soon as possible. If an oplog is set to too small of a value, it can result in a gap between a backup job and an oplog which makes the backup unusuable for restores. Stale backup jobs must be resynchronized before it can be used for restores.
Understanding the risks given, you can start backups using this changed minimum value. Once you pass the 24 hour mark, you should reset this minimum value to preserve the sanity check for the global Ops Manager installation going forward.
System¶
Logs Display OutOfMemoryError¶
If your logs display OutOfMemoryError
, ensure you are running
with sufficient ulimits and RAM.
Increase User Limits¶
For the recommended User Limit (ulimit
) setting, see the FAQ
on Receive “Host has low open file limits” or “Too many open files” error messages.
Ops Manager infers the host’s ulimit
setting using the total number of
available and current connections. To learn more about ulimit
in MongoDB, see the
UNIX ulimit Settings
reference page in the MongoDB manual.
Ensure Sufficient RAM for All Components¶
Ensure that each server has enough RAM for the components it runs. If a server runs multiple components, its RAM must be at least the sum of the required amount of RAM for each component.
For the individual RAM requirements for the Ops Manager Application server, Ops Manager Application Database, Backup Daemon server, and Backup Database, see Ops Manager System Requirements.
Obsolete Config Settings¶
Ops Manager fails to start if there are obsolete configuration settings set in the conf-mms.properties file. If there is an obsolete setting, the log lists an Obsolete Setting error as in the following:
Warning
[OBSOLETE SETTING] Remove mms.multiFactorAuth.require
or
replace mms.multiFactorAuth.require
with
mms.multiFactorAuth.level
.
You will need to remove or replace the obsolete property in the
conf-mms.properties
file before you can start Ops Manager.
Logs Display java.lang.OutOfMemoryError
¶
If your logs display java.lang.OutOfMemoryError: Java heap space
,
you can adjust the Java Heap memory settings. Repeat the
following steps for every host running an Ops Manager instance except
dedicated Backup Daemon hosts.
- Linux Hosts
- Windows Hosts
Open
mms.conf
in your preferred text editor.Find this line:
The key values in this line are:
-Xmx
Java Heap Maximum Memory -Xms
Java Heap Starting Memory By default, these values are both set to 4,352 MB (
4352m
).Do not change other Java options
Changing any option values other than
-Xmx
and-Xms
could have an unexpected impact on the Ops Manager Application. Do not change other values without consulting MongoDB Support.Change the
-Xmx
and-Xms
values to a larger value. Both parameters should be set to the same value to remove any performance impact from the VM constantly reclaiming memory from the heap.The value is specified as
#k|m|g
: a number followed byk
(kilobytes),m
(megabytes), org
(gigabytes)
Example
To set the Java heap to 10 GB, set this value to:
-Xmx10g -Xms10g
Click Run from the Start menu.
Type
regedit
.Click OK.
If User Access Control asks Do you want to allow this app to make changes to your device?, click Yes.
Edit the following registry value:
Change the
-Xmx
and-Xms
values in the Options multi-line registry key to a larger value. Both parameters should be set to the same value to remove any performance impact from the VM constantly reclaiming memory from the heap. The value of the Options key should include the following:The key values in this block are:
-Xmx
Java Heap Maximum Memory -Xms
Java Heap Starting Memory By default, these values are both set to 4,352 MB (
4352m
).Do not change other Java options
Changing any option values other than
-Xmx
and-Xms
could have an unexpected impact on the Ops Manager Application. Do not change other values without consulting MongoDB Support.The value is specified as
#k|m|g
: a number followed byk
(kilobytes),m
(megabytes), org
(gigabytes)
Example
To set the Java heap to 10 GB, set this value to:
-Xmx10g -Xms10g
The optimal value for your Ops Manager installation depends upon your Ops Manager hosts’ architecture: platform, physical memory, etc. The goal of heap tuning is to balance the time the JVM spends reclaiming memory from objects that are no longer running, also known as garbage collection, and the frequency of garbage collection.
- Large heap sizes mean less frequent and slower garbage collection.
- Small heap sizes mean more frequent and faster garbage collection.
Important
Your Java heap size must not be set to a value greater than the
amount of physical memory in the Ops Manager host or less than the
default value of 4352m
(4,352 MB).
Automation Checklist¶
Ops Manager Automation allows you to deploy, configure, and manage MongoDB deployments with the Ops Manager UI. Ops Manager Automation relies on an Automation Agent, which must be installed on every server in the deployment. The Automation Agents periodically poll the Ops Manager service to determine the current goal, and continually report their status to Ops Manager.
To use Automation, you must install the Automation Agent on each server that you want Ops Manager to manage.
Automation Runs Only on 64-bit Architectures¶
Ops Manager provides only 64-bit downloads of the Automation Agent.
Using Own Hardware¶
If you deploy Automation manually, ensure that you have one Automation Agent on every server.
If you deploy the agent manually, you must create MongoDB’s
dbpath
and the directory for the MongoDB binaries and ensure that the user running the agent owns these directories.If you install using the
rpm
package, the agent runs as themongod
user; if using thedeb
package, the agent runs as themongodb
user. If you install using thetar.gz
archive file, you can run the agent as any user.
Networking¶
All hosts must be able to allow communication between MongoDB
ports. The default is 27017
, but you can configure alternate port
ranges in the Ops Manager interface.
The Automation Agent must be able to connect to
Ops Manager on port 8443
(i.e. https
).
For more information on access to ports and IP addresses, see
Security Overview.
Automation Configuration¶
After completing the automation configuration, always ensure that the deployment plan satisfies the needs of your deployment. Always double check hostnames and ports before confirming the deployment.
Sizing¶
- Ensure that you provision machines with enough space to run MongoDB and support the requirements of your data set.
- Ensure that you provision sufficient machines to run your deployment. Each mongod should run on its own host.
Frequent Connection Timeouts¶
The Automation Agent may frequently time out of connections for one or more of the following reasons:
- High network latency
- High server load
- Large SSL keys
- Insufficient CPU speed
By default, connections time out after 40 seconds. MongoDB recommends
gradually increasing the value of the dialTimeoutSeconds
Automation Agent configuration setting to prevent frequent premature
connection timeouts. However, increasing this value also increases the
time required to deploy future configuration changes. Experiment with
small, incremental increases until you determine the optimum value for
your deployment. See dialTimeoutSeconds
in
Connection Settings at Automation Agent Configuration
for more information.