- Manage Deployments >
- Alerts >
- Alert Conditions
Alert Conditions¶
On this page
Overview¶
Ops Manager provides configurable alert conditions that you can apply to Ops Manager components, such as hosts, clusters, or agents. This document groups the conditions according to the target components to which they apply.
Select alert conditions when configuring alerts, for more information on configuring alerts, see the Create an Alert Configuration and Manage Alerts documents.
Host Alerts¶
The Host Alerts are applicable to MongoDB hosts (i.e. mongos and mongod instances). and are grouped here according to the category monitored.
Host Status¶
-
is down
¶ Sends an alert when Ops Manager does not receive a ping from a host for more than 9 minutes. Under normal operation the Monitoring Agent connects to each monitored host about once per minute. Ops Manager will not alert immediately, however, but waits nine minutes in order to minimize false positives, as would occur, for example, during a host restart.
-
is recovering
¶ Sends an alert when a secondary member of a replica set enters the
RECOVERING
state. For information on theRECOVERING
state, see Replica Set Member States.
-
does not have latest version
¶ This does not apply to Ops Manager.
Sends an alert when the version of MongoDB running on a host is more than two releases behind. For example if the current production version of MongoDB is 2.6.0 and the previous release is 2.4.9 then a host running version 2.4.8 will trigger this alert but a host running 2.4.9 (previous) 2.6.0 (current) or 2.6.1-rc2 (nightly) will not.
Asserts¶
These alert conditions refer to the metrics found on the host’s
asserts
chart. To view the chart, see
Accessing a Host’s Statistics.
-
Asserts: Regular is
¶ Sends an alert if the rate of regular asserts meets the specified threshold.
-
Asserts: Warning is
¶ Sends an alert if the rate of warnings meets the specified threshold.
-
Asserts: Msg is
¶ Sends an alert if the rate of message asserts meets the specified threshold. Message asserts are internal server errors. Stack traces are logged for these.
-
Asserts: User is
¶ Sends an alert if the rate of errors generated by users meets the specified threshold.
Opcounter¶
These alert conditions refer to the metrics found on the host’s opcounters
chart. To view the chart, see Accessing a Host’s Statistics.
-
Opcounter: Cmd is
¶ Sends an alert if the rate of commands performed meets the specified threshold.
-
Opcounter: Query is
¶ Sends an alert if the rate of queries meets the specified threshold.
-
Opcounter: Update is
¶ Sends an alert if the rate of updates meets the specified threshold.
-
Opcounter: Delete is
¶ Sends an alert if the rate of deletes meets the specified threshold.
-
Opcounter: Insert is
¶ Sends an alert if the rate of inserts meets the specified threshold.
-
Opcounter: Getmores is
¶ Sends an alert if the rate of getmore (i.e. cursor batch) operations meets the specified threshold. For more information on getmore operations, see the Cursors page in the MongoDB manual.
Opcounter - Repl¶
These alert conditions apply to hosts that are secondary members of
replica sets. The alerts use the metrics found on the host’s
opcounters - repl
chart. To view the chart, see Accessing a Host’s Statistics.
-
Opcounter: Repl Cmd is
¶ Sends an alert if the rate of replicated commands meets the specified threshold.
-
Opcounter: Repl Update is
¶ Sends an alert if the rate of replicated updates meets the specified threshold.
-
Opcounter: Repl Delete is
¶ Sends an alert if the rate of replicated deletes meets the specified threshold.
-
Opcounter: Repl Insert is
¶ Sends an alert if the rate of replicated inserts meets the specified threshold.
Memory¶
These alert conditions refer to the metrics found on the host’s
memory
and non-mapped virtual memory
charts. To view the
charts, see Accessing a Host’s Statistics. For additional information
about these metrics, click the i icon for each chart.
-
Memory: Resident is
¶ Sends an alert if the size of the resident memory meets the specified threshold. It is typical over time, on a dedicated database server, for the size of the resident memory to approach the amount of physical RAM on the box.
-
Memory: Virtual is
¶ Sends an alert if the size of virtual memory for the mongod process meets the specified threshold. You can use this alert to flag excessive memory outside of memory mapping. For more information, click the
memory
chart’s i icon.
-
Memory: Mapped is
¶ Sends an alert if the size of mapped memory, which maps the data files, meets the specified threshold. As MongoDB memory-maps all the data files, the size of mapped memory is likely to approach total database size.
-
Memory: Computed is
¶ Sends an alert if the size of virtual memory that is not accounted for by memory-mapping meets the specified threshold. If this number is very high (multiple gigabytes), it indicates that excessive memory is being used outside of memory mapping. For more information on how to use this metric, view the
non-mapped virtual memory
chart and click the chart’s i icon.
B-tree¶
These alert conditions refer to the metrics found on the host’s
btree
chart. To view the chart, see
Accessing a Host’s Statistics.
-
B-tree: accesses is
¶ Sends an alert if the number of accesses to B-tree indexes meets the specified average.
-
B-tree: hits is
¶ Sends an alert if the number of times a B-tree page was in memory meets the specified average.
-
B-tree: misses is
¶ Sends an alert if the number of times a B-tree page was not in memory meets the specified average.
-
B-tree: miss ratio is
¶ Sends an alert if the ratio of misses to hits meets the specified threshold.
Lock %¶
This alert condition refers to metric found on the host’s lock %
chart.
To view the chart, see Accessing a Host’s Statistics.
-
Effective Lock % is
¶ Sends an alert if the amount of time the host is write locked meets the specified threshold. For details on this metric, view the
lock %
chart and click the chart’s i icon.
Background¶
This alert condition refers to metric found on the host’s background flush
avg
chart. To view the chart, see Accessing a Host’s Statistics.
-
Background Flush Average is
¶ Sends an alert if the average time for background flushes meets the specified threshold. For details on this metric, view the
background flush avg
chart and click the chart’s i icon.
Connections¶
The following alert condition refers to a metric found on the host’s connections
chart. To view the chart, see Accessing a Host’s Statistics.
-
Connections is
¶ Sends an alert if the number of active connections to the host meets the specified average.
Queues¶
These alert conditions refer to the metrics found on the host’s queues
chart.
To view the chart, see Accessing a Host’s Statistics.
-
Queues: Total is
¶ Sends an alert if the number of operations waiting on a lock of any type meets the specified average.
-
Queues: Readers is
¶ Sends an alert if the number of operations waiting on a read lock meets the specified average.
-
Queues: Writers is
¶ Sends an alert if the number of operations waiting on a write lock meets the specified average.
Page Faults¶
These alert conditions refer to metrics found on the host’s Record Stats
and Page Faults
charts. To view the charts, see Accessing a Host’s Statistics.
-
Accesses Not In Memory: Total is
¶ Sends an alert if the rate of disk accesses meets the specified threshold. MongoDB must access data on disk if your working set does not fit in memory. This metric is found on the host’s
Record Stats
chart.
-
Page Fault Exceptions Thrown: Total is
¶ Sends an alert if the rate of page fault exceptions thrown meets the specified threshold. This metric is found on the host’s
Record Stats
chart.
-
Page Faults is
¶ Sends an alert if the rate of page faults (whether or not an exception is thrown) meets the specified threshold. This metric is found on the host’s
Page Faults
chart.
Cursors¶
These alert conditions refer to the metrics found on the host’s cursors
chart. To view the chart, see Accessing a Host’s Statistics.
-
Cursors: Open is
¶ Sends an alert if the number of cursors the server is maintaining for clients meets the specified average.
-
Cursors: Timed Out is
¶ Sends an alert if the number of timed-out cursors the server is maintaining for clients meets the specified average.
-
Cursors: Client Cursors Size is
¶ Sends an alert if the cumulative size of the cursors the server is maintaining for clients meets the specified average.
Network¶
These alert conditions refer to the metrics found on the host’s network
chart. To view the chart, see Accessing a Host’s Statistics.
-
Network: Bytes In is
¶ Sends an alert if the number of bytes sent to the database server meets the specified threshold.
-
Network: Bytes Out is
¶ Sends an alert if the number of bytes sent from the database server meets the specified threshold.
-
Network: Num Requests is
¶ Sends an alert if the number of requests sent to the database server meets the specified average.
Replication¶
These alert conditions refer to the metrics found on a primary’s replication oplog window
chart or a secondary’s replication lag
chart. To view the charts, see
Accessing a Host’s Statistics.
-
Replication Oplog Window is
¶ Sends an alert if the approximate amount of time available in the primary’s replication oplog meets the specified threshold.
-
Replication Lag is
¶ Sends an alert if the approximate amount of time that the secondary is behind the primary meets the specified threshold.
-
Replication Headroom is
¶ Sends an alert when the difference between the primary oplog window and the replication lag time on a secondary meets the specified threshold.
-
Oplog Data per Hour is
¶ Sends an alert when the amount of data per hour being written to a primary’s oplog meets the specified threshold.
DB Storage¶
This alert condition refers to the metric displayed on the host’s db storage
chart. To view the chart, see Accessing a Host’s Statistics.
-
DB Storage is
¶ Sends an alert if the amount of on-disk storage space used by extents meets the specified threshold. Extents are contiguously allocated chunks of datafile space.
DB storage size is larger than DB data size because storage size measures the entirety of each extent, including space not used by documents. For more information on extents, see the collStats command.
-
DB Data Size is
¶ Sends an alert if approximate size of all documents (and their paddings) meets the specified threshold.
Journaling¶
These alert conditions refer to the metrics found on the host’s journal -
commits in write lock
chart and journal stats
chart. To view the
charts, see Accessing a Host’s Statistics.
-
Journaling Commits in Write Lock is
¶ Sends an alert if the rate of commits that occurred while the database was in write lock meets the specified average.
-
Journaling MB is
¶ Sends an alert if the average amount of data written to the recovery log meets the specified threshold.
-
Journaling Write Data Files MB is
¶ Sends an alert if the average amount of data written to the data files meets the specified threshold.
Replica Set Alerts¶
These alert conditions are applicable to replica sets.
-
Primary Elected
¶ Sends an alert when a set elects a new primary. Each time Ops Manager receives a ping, it inspects the output of the replica set’s rs.status() method for the status of each replica set member. From this output, Ops Manager determines which replica set member is the primary. If the primary found in the ping data is different than the current primary known to Ops Manager, this alert triggers.
Primary Elected
does not always mean that the set elected a new primary.Primary Elected
may also trigger when the same primary is re-elected. This can happen when Ops Manager processes a ping in the midst of an election.
-
No Primary
¶ Sends an alert when a replica set does not have a primary. Specifically, when none of the members of a replica set have a status of
PRIMARY
, the alert triggers. For example, this condition may arise when a set has an even number of voting members resulting in a tie.If the Monitoring Agent collects data during an election for primary, this alert might send a false positive. To prevent such false positives, set the alert configuration’s after waiting interval (in the configuration’s Send to section).
-
Number of Healthy Members is below
¶ Sends an alert when a replica set has fewer than the specified number of healthy members. If the replica set has the specified number of healthy members or more, Ops Manager triggers no alert.
A replica set member is healthy if its state, as reported in the rs.status() output, is either
PRIMARY
orSECONDARY
. Hidden secondaries and arbiters are not counted.As an example, if you have a replica set with one member in the
PRIMARY
state, two members in theSECONDARY
state, one hidden member in theSECONDARY
, oneARBITER
, and one member in theRECOVERING
state, then the healthy count is 3.
-
Number of Unhealthy Members is above
¶ Sends an alert when a replica set has more than the specified number of unhealthy members. If the replica set has the specified number or fewer, Ops Manager sends no alert.
Replica set members are unhealthy when the agent cannot connect to them, or the member is in a rollback or recovering state.
Hidden secondaries are not counted.
Agent Alerts¶
These alert conditions are applicable to Monitoring Agents and Backup Agents.
-
Monitoring Agent is down
¶ Sends an alert if the Monitoring Agent has been down for at least 7 minutes. Under normal operation, the Monitoring Agent sends a ping to Ops Manager roughly once per minute. If Ops Manager does not receive a ping for at least 7 minutes, this alert triggers. However, this alert will never trigger for a group that has no hosts configured.
Important
When the Monitoring Agent is down, Ops Manager will trigger no other alerts. For example, if a host is down there is no Monitoring Agent to send data to Ops Manager that could trigger new alerts.
-
Backup Agent is down
¶ Sends an alert if the Backup Agent has been down for at least 15 minutes. Under normal operation, the Backup Agent periodically sends data to Ops Manager. This alert is never triggered for a group that has no running backups.
-
Monitoring Agent is out of date
¶ Sends an alert when the Monitoring Agent is not running the latest version of the software.
-
Backup Agent is out of date
¶ Sends an alert when the Backup Agent is not running the latest version of the software.
Backup Alerts¶
These alert conditions are applicable to the Ops Manager Backup service.
-
Oplog Behind
¶ Sends an alert if the most recent oplog data received by Ops Manager is more than 75 minutes old.
-
Resync Required
¶ Sends an alert if the replication process for a backup falls too far behind the oplog to catch up. This occurs when the host overwrites oplog entries that backup has not yet replicated. When this happens, backup must be fully resynced.
-
Cluster Mongos Is Missing
¶ Sends an alert if Ops Manager cannot reach a mongos for the cluster.