Navigation
You were redirected from a different version of the documentation. Click here to go back.

Replication Lag

Description

At time T, the last write operation applied on the specified secondary of replica set ABC was behind the most recent operation applied on the primary.

Common Triggers

  • An idle replica set. The reported replication lag is actually just the time since the last write. Replication lag is calculated between the last operation time on the primary and the time of the last operation received by the secondary. If a replica set is only written to once every 10 minutes, the replication lag will be 10 minutes just after the write is made to the primary and just prior to the next write being replicated to the secondary.
  • The secondary is under provisioned and cannot keep up with the primary (common if using secondaries for read scaling).
  • There is insufficient bandwidth, or some other networking problem, between the primary and secondary.

Possible Solutions

  • Adjust the settings for this alert to only trigger if the replication lag persists for longer than 2 minutes. This will reduce the chances of a false positive.
  • Move (or upgrade in place) the secondary to a machine that is identically (or better) provisioned to the current primary.
  • Increase bandwidth and/or resolve networking issues between the primary and secondary.

Documentation

←   Host Down No Primary  →