Troubleshooting¶
On this page
This document provides advice for troubleshooting problems with Ops Manager.
Getting Started Checklist¶
To begin troubleshooting, complete these tasks to check for common, easily fixed problems:
- Authentication Errors
- Check Agent Output or Log
- Confirm Only One Agent is Actively Monitoring
- Ensure Connectivity Between Agent and Monitored Hosts
- Ensure Connectivity Between Agent and Ops Manager Server
- Allow Agent to Discover Hosts and Collect Initial Data
Authentication Errors¶
If your MongoDB instances run with authentication enabled, ensure Ops Manager has the MongoDB credentials. See Configure MongoDB Authentication and Authorization.
Check Agent Output or Log¶
If you continue to encounter problems, check the agent’s output for errors. See Agent Logs for more information.
Confirm Only One Agent is Actively Monitoring¶
If you run multiple Monitoring Agents, make sure they are all the same version and that only one if actively monitoring. When you upgrade a Monitoring Agent, do not forget to delete any old standby agents.
When you run multiple agents, one runs as the primary agent and the others as standby agents. Standby agents poll Ops Manager periodically to get the configuration but do not send data.
To determine which agent is the primary agent, look at the Status value on the Administration tab’s Agents page. If there is no last ping value for a listed agent, the agent is a standby agent.
See Monitoring FAQs and Add Existing MongoDB Processes to Monitoring for more information.
Ensure Connectivity Between Agent and Monitored Hosts¶
Ensure the system running the agent can resolve and connect to the MongoDB instances. To confirm, log into the system where the agent is running and issue a command in the following form:
Replace [hostname]
with the hostname and [port]
with the
port that the database is listening on.
Ensure Connectivity Between Agent and Ops Manager Server¶
Verify that the Monitoring Agent can connect on TCP port 443
(outbound) to the Ops Manager server (i.e. “mms.mongodb.com
”.)
Allow Agent to Discover Hosts and Collect Initial Data¶
Allow the agent to run for 5-10 minutes to allow host discovery and initial data collection.
Installation¶
Why doesn’t the monitoring server startup successfully?¶
Confirm the URI or IP address for the Ops Manager service is stored
correctly in the mongo.mongoUri
property in the
<install_dir>/conf/conf-mms.properties
file:
If you don’t set this property, Ops Manager will fail while trying to connect to the default 127.0.0.1:27017 URL.
If the URI or IP address of your service changes, you must update the property with the new address. For example, update the address if you deploy on a system without a static IP address, or if you deploy on EC2 without a fixed IP and then restart the EC2 instance.
If the URI or IP address changes, then each user who access the service
must also update the address in the URL used to connect and in the
client-side monitoring-agent.config
files.
If you use the Ops Manager <install_dir>/bin/credentialstool to encrypt
the password used in the mongo.mongoUri
value, also add the
mongo.encryptedCredentials
key to the
<install_dir>/conf/conf-mms.properties
file and set the value for this
property to true:
Monitoring¶
Alerts¶
For information on creating and managing alerts, see Create an Alert Configuration and Manage Alerts.
Cannot Turn Off Email Notifications¶
There are at least two ways to turn off alert notifications:
- Remove the deployment from your Ops Manager account. See Remove Processes from Monitoring.
- Disable or delete the alert in Ops Manager. Click the Activity tab then click Alert Settings. To the right of an alert, select the gear icon and select Disable or Delete.
Receive Duplicate Alerts¶
If the notification email list contains multiple email-groups, one or more people may receive multiple notifications of the same alert.
Receive “Host has low open file limits” or “Too many open files” error messages¶
These error messages appear on the Deployment page, under a host’s name. They appear if the number of available connections does not meet an Ops Manager-defined minimum value. These errors are not generated by the mongos instance and, therefore, will not appear in mongos log files.
On a host by host basis, the Monitoring Agent compares the number of open file descriptors and connections to the maximum connections limit. The max open file descriptors ulimit parameter directly affects the number of available server connections. The agent calculates whether or not enough connections exist to meet an Ops Manager-defined minimum value.
In ping documents, for each node and its serverStatus.connections
values,
if the sum of the current
value plus the available
value is less than
the maxConns
configuration value set for a monitored host, the Monitoring
Agent will send a Host has low open file limits or Too
many open files message to Ops Manager.
Ping documents are data sent by Monitoring Agents to Ops Manager. To view ping documents, click the Deployment page, then click the host’s name, and then click Last Ping.
To prevent this error, we recommend you set ulimit
open files to 64000. We
also recommend setting the maxConns
command in the mongo shell to at least
the recommended settings.
See the MongoDB ulimit reference page and the the MongoDB maxConns reference page for details.
Deployments¶
Deployment Hangs in In Progress
¶
If you have added or restarted a deployment and the deployment remains
in the In Progress
state for several minutes, click View
Agent Logs and look for any errors.
If you diagnose an error and need to correct the deployment configuration:
- Click Edit Configuration and then click Edit Configuration again.
- Reconfigure the deployment.
- When you complete your changes, click Review & Deploy and then Confirm & Deploy.
If you shut down the deployment and still cannot find a solution, unmanage the deployment.
Monitoring Agent Fails to Collect Data¶
Possible causes for this state:
- If the Monitoring Agent can’t connect to the server because of networking restrictions or issues (i.e. firewalls, proxies, routing.)
- If your database is running with SSL. You must enable SSL either globally or on a per-host basis. See Configure Monitoring Agent for SSL and Configure SSL for MongoDB for more information.
- If your database is running with authentication. You must supply Monitoring with the authentication credentials either when you’re adding a host or by clicking the gear icon and then Edit Host to the right of the entry on the Deployment page.
Hosts¶
Hosts are not Visible¶
Problems with the Monitoring Agent detecting hosts can be caused by a few factors.
Host not added to |mms|: In Ops Manager, select the Deployment tab, then the Deployment page, and then click the Add Host button. In the New Host window, specify the host type, internal hostname, and port. If appropriate, add the database username and password and whether or not Ops Manager should use SSL to connect with your Monitoring Agent. Note it is not necessary to restart your Monitoring Agent when adding (or removing) a host.
Accidental duplicate mongods If you add the host after a crash and restart the Monitoring Agent, you might not see the hostname on the Ops Manager Deployment page. Ops Manager detects the host as a duplicate and suppresses its data. To reset, select the Administration tab, then Group Settings, and then the Reset Duplicates button.
Too many Monitoring Agents installed: Only one Monitoring Agent is needed to monitor all hosts within a single network. You can use a single Monitoring Agent if your hosts exist across multiple data centers and can be discovered by a single agent. Check you have only one Monitoring Agent and remove old agents after upgrading the Monitoring Agent.
A second Monitoring Agent can be set up for redundancy. However, the Ops Manager Monitoring Agent is robust. Ops Manager sends an Agent Down alert only when there are no available Monitoring Agents available. See Monitoring FAQ and Monitoring Architecture for more information.
Cannot Delete a Host¶
In rare cases, the mongod is brought down and the replica set is reconfigured. The down host cannot be deleted and returns an error message, “This host cannot be deleted because it is enabled for backup.” Contact |mms| Support for help in deleting these hosts.
For more information on deleted hosts, see Remove Hosts.
Groups¶
Cannot Delete a Group¶
Please contact your Ops Manager administrator to remove a company or group from your Ops Manager account.
Additional Information on Groups¶
Create a group to monitor additional segregated systems or environments for servers, agents, users, and other resources. For example, your deployment might have two or more environments separated by firewalls. In this case, you would need two or more separate Ops Manager groups.
API and shared secret keys are unique to each group. Each group requires its own agent with the appropriate API and shared secret keys. Within each group, the agent needs to be able to connect to all hosts it monitors in the group.
For information on creating and managing groups, see Manage Groups.
Munin¶
Install and configure the munin-node
daemon on the monitored MongoDB
server(s) before starting Ops Manager monitoring. The Ops Manager agent README file provides
guidelines to install munin-node
. However, new versions of Linux,
specifically Red Hat Linux (RHEL) 6, can generate error messages. See
Configure Hardware Monitoring with munin-node for details about
monitoring hardware with munin-node
.
Restart munin-node
after creating links for changes to take effect.
“No package munin-node is available” Error¶
To correct this error, install the most current version of the Linux repos. Type these commands:
Then type this command to install munin-node
and all dependencies:
Non-localhost IP Addresses are Blocked¶
By default, munin blocks incoming connections from non-localhost IP addresses.
The /var/log/munin/munin-node.log
file will display a
“Denying connection” error for your non-localhost IP address.
To fix this error, open the munin-node.conf
configuration file and comment
out these two lines:
Then add this line to the munin-node.conf
configuration file with a pattern
that matches your subnet:
Restart munin-node
after editing the configuration file for changes to take effect.
Verifying iostat and Other Plugins/Services Returns “# Unknown service” Error¶
The first step is to confirm there is a problem. Open a telnet session and
connect to iostat
, iostat_ios
, and cpu
:
The iostat_ios
plugin creates the iotime
chart, and the cpu
plugin creates the cputime
chart.
If any of these telnet fetch
commands returns an “# Unknown Service” error,
create a link to the plugin or service in /etc/munin/plugins/ by typing these
commands:
Replace <service>
with the name of the service that generates the error.
Disk names are not listed by Munin¶
In some cases, Munin will omit disk names with a dash between the name and a
numerical prefix, for example, dm-0
or dm-1
. There is a documented fix for
Munin’s iostat plugin.
Authentication¶
Two-Factor Authentication¶
Missed SMS Authentication Tokens¶
Unfortunately SMS is not a 100% reliable delivery mechanism for messages, especially across international borders. The Google authentication option is 100% reliable. Unless you must use SMS for authentication, use the Google Authenticator application for two-factor authentication.
If you do not receive the SMS authentication tokens:
- Refer to the Administration page for more details about using two-factor authentication. This page includes any limitations which may affect SMS delivery times.
- Enter the SMS phone number with country code first followed by the area code and the phone number. Also try 011 first followed by the country code, then area code, and then the phone number.
If you do not receive the authentication token in a reasonable amount of time contact |mms| Support to rule out SMS message delivery delays.
How to Delete or Reset Two-Factor Authentication¶
Contact your system administrator to remove or reset two-factor authentication on your account.
For administrative information on two-factor authentication, see Manage Two-Factor Authentication for Ops Manager.
LDAP¶
Cannot Enable LDAP¶
You must enable LDAP before you sign up the first user through the Ops Manager interface, as required in Authentication Requirements. If you cannot enable LDAP because you have created a user through Ops Manager, you must reinstall Ops Manager, being sure to enable LDAP before creating users and hosts. Please see Authentication Requirements.
Forgot to Change MONGODB-CR Error¶
If your MongoDB deployment uses LDAP for authentication, and you find the following error message:
Then make sure that you specified the LDAP (PLAIN)
as is the
authentication mechanism for both the Monitoring Agent and the Backup
Agent. See Configure Backup Agent for LDAP Authentication and
Configure Monitoring Agent for LDAP.
All Deployments¶
Networking¶
All hosts must be able to allow communication between MongoDB
ports. The default is 27017
, but you can configure alternate port
ranges in the Ops Manager interface.
The Automation Agent must be able to connect to
mms.mongodb.com
on port 443
(i.e. https
).
For more information on access to ports and IP addresses, see
Security Overview.
Directory and File Permissions¶
The Automation Agent directories and files require the permissions describe here. Paths and filenames vary depending on the operating system.
Automation Agent Directory
The agent directory and the agent configuration file require Read and Execute permissions for the user that runs the Automation Agent.
On RHEL, CentOS, Amazon Linux, & Ubuntu, the agent directory is
/etc/mongodb-mms
. The agent stores its configuration in theautomation-agent.config
file in that directory:On other Linux systems and on OS X, you will define the agent directory during installation. The agent stores its configuration in the
local.config
file in that directory.Supporting Files
Supporting files include the Monitoring and Backup Agents, the MongoDB binaries, and a backup copy of the JSON-based automation configuration file. The directory that stores these requires
Read
,Write
, andExecute
permissions for the user that runs the Automation Agent. The agent requires write permissions are so that the agent can write to the automation configuration file.On RHEL, CentOS, Amazon Linux, & Ubuntu, Automation stores these in the same directory as the Automation Agent.
On other Linux operating systems and on OS X, Automation stores these in the
/var/lib/mongodb-mms-automation
directory.Log File
The log file requires
Write
permission for the user that runs the Automation Agent. By default, the agent logs events in the following log file:The Automation Agent’s configuration file specifies the location of the log file, as well as the log level and log-rotation settings.
Automation Configuration¶
After completing the automation configuration, always ensure that the deployment plan satisfies the needs of your deployment. Always double check hostnames and ports before confirming the deployment.
Sizing¶
- Ensure that you provision machines with enough space to run MongoDB and support the requirements of your data set.
- Ensure that you provision sufficient machines to run your deployment. Each mongod should run on its own host.
Operating System¶
The Automation Agent only supports Linux and OS X hosts. The Automation Agent does not support Windows.
Backup¶
Logs Display MongodVersionException¶
The MongodVersionException
can occur if the Backup Daemon’s host cannot access the internet to download the
version or versions of MongoDB required for the backed-up databases.
Each database requires a version of MongoDB that matches the database’s
version. Specifically, for each instance you must run the latest stable release
of that release series. For versions earlier than 2.4, the database
requires the latest stable release of 2.4.
If the Daemon runs without access to the internet, you must manually download the required MongoDB versions, as described here:
Go to the MongoDB downloads page and download the appropriate versions for your environment.
Copy the download to the Daemon’s host.
Decompress the download into the directory specified in the
mongodb.release.directory
setting in the Daemon’sconf-daemon.properties
file. For the file’s location, see Ops Manager Configuration Files.Within the directory specified in the
mongodb.release.directory
setting, the folder structure for MongoDB should look like the following:
System¶
Logs Display OutOfMemoryError¶
If your logs display OutOfMemoryError
, ensure you are running with
sufficient ulimits and RAM.
Increase Ulimits¶
For the recommended ulimit setting, see the FAQ on Receive “Host has low open file limits” or “Too many open files” error messages.
Ops Manager infers the host’s ulimit
setting using the total number of
available and current connections. For more information about ulimits
in MongoDB, see the UNIX ulimit Settings
reference page.
Ensure Sufficient RAM for All Components¶
Ensure that each server has enough RAM for the components it runs. If a server runs multiple components, its RAM must be at least the sum of the required amount of RAM for each component.
For the individual RAM requirements for the Ops Manager Application server, Ops Manager Application database, Backup Daemon server, and Backup Blockstore database, see Ops Manager Hardware and Software Requirements.
Automation Checklist¶
Ops Manager Automation allows you to deploy, configure, and manage MongoDB deployments with the Ops Manager UI. Ops Manager Automation relies on an Automation Agent, which must be installed on every server in the deployment. The Automation Agents periodically poll the Ops Manager service to determine the current goal, and continually report their status to Ops Manager.
To use Automation, you must install the Automation Agent on each server that you want Ops Manager to manage.