With on-call alerts, sometimes it can wait..
For most of us in ops, it is vital for us to get notified asap about problems that impact the services we provide. It’s often a race against time to restore the service or to prevent an outage. But not all alerts require an immediate response, some can wait. Enabling users to deal with alerts that don’t require an immediate response efficiently, is just as important in preventing alert fatigue, to ensure we can stay fresh.
At OpsGenie our mission is to empower our users to be able to handle critical as well as non critical/urgent incidents efficiently.
Snooze that alert
Sometimes an alert does not require an immediate response, but it still requires an action. Sure, you can acknowledge the alert and come back to it later, but acknowledging an alert stops the notifications and the escalation process. What if you forget? An alert that was not urgent at the time can become a real problem. Snoozing the alert is a great solution for this situation. Snooze the alert for some time and if you don’t get back to it and resolve the problem, alerting would begin again after the specified time frame.
Delay those notifications
For non-urgent alerts, you can wait for some time before notifying the users. Using a notification policy, alert notifications can be delayed, for a few minutes, hours, or until a certain time. Delaying non-urgent alert notifications can make a significant improvement in the lives of ops people, our primary mission.
- When alert notifications are delayed, alerts are still visible to the users hence a team member who may be better suited to respond can still look into the alert and can take care of it.
- Some problems are transient, services may be restarted, closing the alert automatically.
- Some problems can wait until business hours and does not require waking up an on-call engineer. For these type of nonurgent problems, alert notifications can be delayed till the morning. If the alerts are still open, the appropriate people can be notified to ensure they don’t fall through the cracks.
Set your own rules!
With OpsGenie, notification rules not only allow users to control how they are getting notified for different alerts, they can also be used to specify a time delay for each notification method. For example, (for non critical/urgent alerts) using notification rules, a user can configure OpsGenie to get notified via email immediately, and via push/SMS after 10 minutes; where for critical alerts can configure push/SMS/phone notifications to be much more aggressive.
The simple fact is, not all alerts are created equal. It makes no sense to scream that the sky is falling when it’s not. If the alerting system cannot empower you to handle your non-urgent/critical alerts with low overhead, it cannot handle critical ones all that well either.