Email Integration: Alerting and Incident Management Solutions

Apr 20, 2016 by Berkay Mollamustafaoglu

In the coming weeks OpsGenie will help buyers looking for a reliable, scalable, and customizable alerting and incident management solution by assessing features, toolsets, and functionality in a comprehensive comparison between OpsGenie, Pagerduty, and VictorOps through a series of detailed blog posts. It is our goal to shed light on who does what, and the stark realities between the three popular technologies. OpsGenie will concentrate on areas within our platform that we believe are extremely important when looking for an alerting and incident management solution for the dev&ops and IT community in general. This week we will focus on Email Integration.

PART 1: Email Integration -- A comparative assessment of the direct contrasts between OpsGenie, Pagerduty, and VictorOps. OpsGenie’s email integration enables customers to integrate OpsGenie with any system that can send alerts via email. Email integration is the most commonly used integration method by our customers since it is easy to use and almost any system out there can send emails.

With on-call alerts, sometimes it can wait..

Apr 5, 2016 by Berkay Mollamustafaoglu

For most of us in ops, it is vital for us to get notified asap about problems that impact the services we provide. It’s often a race against time to restore the service or to prevent an outage. But not all alerts require an immediate response, some can wait. Enabling users to deal with alerts that don’t require an immediate response efficiently, is just as important in preventing alert fatigue, to ensure we can stay fresh.

At OpsGenie our mission is to empower our users to be able to handle critical as well as non critical/urgent incidents efficiently.

Which superhero would be best at answering on-call alerts and real-time incidents?

Mar 25, 2016 by by the OpsGenie Team


 The OpsGenie team recently had a thorough and heated discussion (KAPOW!!) on who would be better with on-call alerts and incident management, Superman or Batman? Who would come out as winner when pitted against each other in a war of on-call alerts and response time? So, we thought we would hash it out here on our blog in a completely fictional format. We’ll try to examine each area of alerting and incident management to see who we think we would want on our on-call team.

Fighting Alert Fatigue - Alert Deduplication -- Part 1

Feb 25, 2016 by Berkay Mollamustafaoglu

The concept of “Alert Fatigue” is well known in industries such as healthcare, and awareness is increasing in IT operations as well. Fighting alert fatigue has been a key design objective for OpsGenie since our inception. Summarized in the earlier post, some of the key capabilities that OpsGenie provides can be used to alleviate alert fatigue. In a two part series, I go into more detail on how these features can improve the alert signal to noise ratio.

Routing phone calls using on-call schedules - OpsGenie

Feb 8, 2016 by Berkay Mollamustafaoglu

Since we launched the OpsGenie phone call routing feature last year, we’ve had an enormously great response from customers. So much, in fact, that we’re dusting off this blog post from last year and updating it for everyone who is not as familiar with it. Is it easy to use? Yes, it is! You see, OpsGenie routes alerts to the appropriate on-call individual using a method of policies, on-call schedules, etc.. Prior to the launch of the application last year, we heard similar questions from a number of our OpsGenie customers, such as “Can we route phone calls to the right person like we route the alerts?” This turned out to be a great question, one that resonated with many of our customers. For a product team, customer feedback like this is priceless!

You woke me up. Now what?

Jan 29, 2016 by Berkay Mollamustafaoglu

As an alert notification solution, our first priority is to ensure that the right person is notified when there is a problem. OpsGenie sends multiple notifications through different channels, escalates etc. to ensure that critical alerts don’t get missed. As crucial as that is, if an alert notification system just stops at “waking you up”, it becomes part of the problem rather than a solution.

How to create a free status page using OpsGenie

Jan 4, 2016 by Tuba Öztürk

Every service provider wants their services to be available 24x7x365. But outages and planned maintenance are inevitable occurrences for online software services. Dealing with outages and communicating with users during the outage is as important as the availability of the services provided. To keep users informed, many service providers use web based “status pages” that contain up to date information about the health of the services, incidents, and what the provider is doing to resolve the issues.

OpsGenie is an incident management system for Dev & Ops teams. Customers use OpsGenie to consolidate their alerts generated by monitoring systems and route them to the right people using on-call schedules and escalations. Because OpsGenie is an essential tool used during outages and we have vital information about the incidents; our customers have been inquiring if we can create “status pages” programmatically based on the alerts generated in OpsGenie.

Responding this request, we’ve taken up the challenge to provide this solution to manage status pages for OpsGenie customers.

Being in the Driving Seat for Web Applications

Dec 18, 2015 by Kadir Türker Gülsoy

As long as our applications are in production, boosting uptime and avoiding outages is the highest priority for us developers and operational teams. Despite the great care, having 100% uptime and avoiding outages is a challenging task for even the most stringent DevOps teams. Let’s imagine that one of your data centers stops responding and in-turn your email service is completely out, or your payment service has gone offline during Black Friday. Remember the AWS outage that lasted four days and affected countless numbers of cloud services in April 2011. This is a good example that outages happen even to the most secure environments.. Now what? Are you going to examine huge log files to find out what went wrong? Are you going to notify all of your operational teams and developers at the same time to investigate the cause? Unless you allocate large resources for chaos engineering like Netflix does, you most likely will have very limited time to overcome the issue. So those aren’t realistic options for most organizations.

Try OpsGenie for free!