InfoQ podcast: Why cloud-native companies need observability

on July 20th 2021

Our co-founder and CEO, Martin Mao, recently sat down with the InfoQ podcast’s Wes Reisz to share his thoughts on observability – it’s a hot topic in the modern cloud-native landscape where microservices running on containers have replaced monolithic applications on VMs. Martin has a unique perspective on monitoring, having spent much of his career running monitoring and observability teams for large born-in-the-cloud companies, including several years solving observability problems for Uber before co-launching his own observability company.

One of the first things Martin likes to point out when chatting about the history of observability is, while the terminology is new, the concept of monitoring – observing your systems – has been around for decades. “It’s been termed monitoring, or perhaps application performance monitoring [APM], or infrastructure monitoring in the past, and that’s largely the same now that the term is observability – perhaps some of the data types of change, but it’s largely the same.” 

Still, he points out two changes behind how we think of observability today – 

  • How we ship software: “When you look at modern businesses and how we do development, that’s changed fairly fundamentally. We are in a mode now where we are shipping updates to our customers and to other internal teams a lot quicker than we were before. And a lot of design and infrastructure architecture has changed because of that so that we can respond much quicker to the business need.”
  • Ownership of the monitoring solution: “Historically it’s been an SRE team or an infrastructure team that’s responsible for monitoring, and you really depend on the tools like your APM tools to go in and collect and display all this data for you. But really what we’ve been seeing more recently with the DevOps movement is that developers themselves own this end-to-end.”

Three phases of observability 

Martin and Wes also delved into the topic of the “three phases of observability” and the outcome engineers are trying to achieve. 

Wes: When you talk to customers about observability, how do you discuss getting their minds around all this different data that’s coming in and structuring it?

Martin: Our framework for thinking about this is looking at it from an end-user perspective, which is the developers themselves. The ultimate goal for developers is to be notified and remediate an issue as quickly as possible, in their applications, and ideally remediate it before a customer finds out. For us, that’s the ultimate goal and we’re optimizing for that. It comes down to answering three questions:

  1. How quickly do I get notified when something is wrong? Is it BEFORE a user/customer has a bad experience?
  2. How easily and quickly can I triage the problem and understand its impact?
  3. How do I find the underlying cause so I can fix the problem?

If you think about why we want to observe our systems, it’s because we want to reduce the negative impact to the business and to the end users. Optimizing for reducing the time to remediation is a main point of this framework.

Give it a listen

Wes and Martin discuss much more, ranging from how Chronosphere solves the observability challenge to approaches to SLAs. Tune in for less than thirty minutes to catch their entire conversation. 

Other resources you may be interested in

The first monitoring solution purpose-built for cloud native deployments.