Heard about the three phases of observability, but aren’t sure what they are? We break down the three phases into an explanation anyone can understand.
On: Aug 10, 2021
Rachel leads Product & Solution Marketing for Chronosphere. Previously, she built out product, technical, and channel marketing at CloudHealth (acquired by VMware). Prior to that she led product marketing for AWS and cloud-integrated storage at NetApp and also spent time as an analyst at Forrester Research covering resiliency, backup, and cloud. Outside of work, she tries to keep up with her young son and hyper-active dog, and when she has time, enjoys crafting and eating out at local restaurants in Boston.
You might have heard the discussions lately about the “three phases of observability.” But what do they really mean? Taking a page out of the “explain it like I’m five” book, I’ll break down the three phases of observability into an explanation anyone can understand, including a five year old.
Imagine your house is on fire. Yes, ok, this is scary, but stay with me. Obviously, your primary goal is to put the fire out as quickly as possible. The order of events typically goes like this:
By this point you’re likely taking steps to put the fire out, or remediate it — either you’ve called the fire department, you’re spraying the fire extinguisher, or pouring baking soda on the flames.
After the fire is out and everyone is safe, you can take a breath, assess the damage and try to determine what caused the fire and understand the root cause. How did this happen? Was your five year old trying to cook something? How can we make sure this doesn’t happen again? These are all critical questions to answer, but they are best done after you’ve remediated the problem. While there is a fire burning in my kitchen, the last thing I’m going to do is lecture my kid on cooking safety. That will come later.
In short, the goal is to get from the fire starting to putting out the fire as quickly as possible, while also running through the steps of know > triage > understand.
Although the personal stakes are much lower, this is the same process an on-call engineer goes through when something goes wrong with their app:
During this entire process they are looking for a way to remediate the problem as fast as possible (i.e., put out the fire). Ideally you’re doing this last stage — understand — after the fire is out! If you can go straight from knowing about a problem to remediating it, that’s ideal. That takes the pressure off the next steps of triage and understanding. A lot of the time, it’s only after the triage stage that the issue can be remediated, and in a rare number of cases, remediation happens during/after the understand phase.
When discussing observability, many people immediately jump to metrics, logs, and traces. These are still incredibly important — they are data inputs to be used throughout the phases. The three phases focuses on outcomes and processes (vs tools and data sets)
Think of metrics, logs, and traces as powering your smoke alarm, your fire extinguisher, or the emergency phone call. They are a means to an end, but not the end in and of themselves.
Chronosphere is a SaaS cloud monitoring tool that helps teams rapidly navigate the three phases of observability. We’re focused on giving devops teams, SREs, and central observability teams all the tools they need to know about problems sooner, the context to triage them faster, and insights to get to the root cause more efficiently. One of the major things that makes us different is that we were built from the ground-up to support the scale and speed of cloud-native systems. We also embrace open source standards, so developers can use the language and formats they’re already familiar with, and it prevents lock-in.
Learn more about Chronosphere and the three phases of observability here.
Request a demo for an in depth walk through of the platform!