The goal of observability is rapid remediation

  • How quickly are you notified when something is wrong or beginning to fail?
  • Can you rapidly triage the problem, understand its impact and effectively monitor any mitigation?
  • How do you find the underlying cause so you can fix the problem?
Ebook: The Three Phases of Observability

The goal: Rapid remediation

Understanding and embracing the three phases of observability is the best way to respond to these questions. During each phase, the focus is on alleviating the customer impact — or remediating the problem — as fast as possible.

Remediation is the act of alleviating customer pain and restoring the service to acceptable levels of availability and performance. At each phase, the engineer is looking for enough information to remediate the issue, even if they don’t yet understand the root cause.

Phase 1: Know about the problem

The first step to resolving an issue is knowing the issue exists — ideally before it impacts any customers. Sometimes, just knowing an issue is occurring is enough to trigger a remediation. For example, if you deploy a new version of a service and an alert triggers for that service, rolling back the deployment is the quickest path to remediating the issue without needing to understand the full impact or diagnose the root cause during the incident. Those can be examined after the issue is remediated, when there isn’t active customer impact.

Introducing changes to a system is the largest source of production issues, so knowing about problems and the scope of the impact as these changes are introduced is key.

Phase 2: Triage the problem

The goal of this phase is to quickly understand the context and impact of an issue. Once an alert goes off, if it is not immediately obvious that a recent change to the system needs to be rolled back, the next step is to understand the business impact and the severity. Often, understanding the scope of the issue can lead to remediation.

To help triage issues, you need to be able to quickly put an alert into context of understanding how many customers or systems are impacted, and to what degree. Great observability allows you to dissect and pivot highly granular data to shine a spotlight on the contextualized telemetry to diagnose issues.

Phase 3: Understand the problem

This phase occurs ideally after remediation, when you can take the time to locate and understand the underlying root cause of issues without the pressure of a ticking clock of customer expectations. With an ever increasing volume of microservices, doing a post mortem on an incident is often an exercise in navigating a twisted web of dependencies and trying to determine which service owner you need to work with.

Great observability gives direct line of sight linking your metrics and alerts to the potential culprits. Additionally, it provides insights that can help fix underlying problems to prevent recurrence of incidents.

Take back control.

Great observability can lead to competitive advantage, world-class customer experiences, faster innovation, and happier developers. But organizations can’t achieve great observability by just focusing on the input and data (metrics and traces). In reality, having more observability data (metrics, logs, traces) doesn’t necessarily help you navigate the three phases faster. Instead, it can slow you down and drive up costs unnecessarily. That’s why, in addition to focusing on observability outcomes — outlined in the three phases — you also must focus on taking back control of your observability. 

Taking back control means taming rampant data growth and associated costs, but also maintaining organizational control by assigning guardrails to teams based on business priority. Organizations who are able to take control over their cost, data growth, and organizational complexity are able to achieve faster remediation, better MTTR, and better customer experiences.

Ready to see it in action?

Now you’re caught up on the phases of observability—ready to get started? Discover innovative observability platform that aids you in incorporating observability practices and procedures.