Learn how Alert History accelerates the alert triage process, and makes the on-call and post-mortem processes less painful for engineers.
On: Mar 18, 2022
Alerts are on the rise, and engineers are experiencing burnout and alert fatigue. When an alert comes in, engineers need to quickly know two things:
Unfortunately, these two basic questions are not always easy to answer. That’s why we are excited to announce Alert History for Monitors (Chronosphere’s alerting engine) to give engineers the context they need to more efficiently and quickly answer these questions. This post provides details around the ways Alert History accelerates the triage process, and makes the on-call and post-mortem processes less painful for engineers.
Every engineer who is debugging an issue – especially during an on-call incident – wants to know as much as possible about the alert in as little time as possible. If you are already deeply familiar with the metric that triggered the alert, you likely already have the context needed to quickly assess and triage in your head. But what if it’s not?
If you are not familiar with the metric, you need to get the necessary information as quickly as possible to decide whether the notification that just woke you up at 3:00 AM needs immediate attention or not. And even once you’ve successfully resolved the alert, without the historical context of the alert in one place, you have to piece together data from old emails, Slack messages, and PagerDuty logs in order to post mortem the incident.
With all of this context, you don’t need to guess at the alert’s importance and urgency. You can more confidently decide if you need to get others out of bed for an all-hands-on-deck incident response or to leave it until the morning.
Alert History is a powerful new addition to Monitors (Chronosphere’s alerting engine) that opens a new window into your data by looking back at an alert’s activity over time. For engineers currently relying on open source alerting solutions that lack historical context for alerts, such as Prometheus Alertmanager, Alert History adds value by:
Alert History can be found in the Monitors page so customers can examine an alert’s context alongside other information related to the alert.
An alert is made up of different types of events. We allow you to view all of them together or to filter the Alert History information down to a specific event type.
To further refine your view, you can filter an alert down to a specific signal grouping and Alert History will filter down the alerting activity to match the selected signal.
Here are three practical examples of when knowing the history of an alert’s activity can help when trying to understand and triage an issue:
Regardless of experience with an alert, engineers can now assess and triage an alert quickly and confidently with Alert History. Alert History’s core functionality is now available for all customers. Over the next several months we’ll be adding several enhancements to this functionality, including a graph view of Alert History for customers to visually identify and understand any patterns in an alert’s activity.
If interested in learning more about Chronosphere and Alert History, request a demo or get in touch.
Request a demo for an in depth walk through of the platform!