Podcasts

Corey Quinn Podcast: Martin Mao weighs in on observability definition

Featuring

An asian man smiling in front of a building, attending a webinar on cloud native observability to drive business transformation.
Martin Mao

Co-founder and CEO
Chronosphere

Martin is a technologist with a history of solving problems at the largest scale in the world and is passionate about helping enterprises use cloud native observability and open source technologies to succeed on their cloud native journey. He’s now the Co-Founder & CEO of Chronosphere, a Series C startup with $255M in funding, backed by Greylock, Lux Capital, General Atlantic, Addition, and Founders Fund. He was previously at Uber, where he led the development and SRE teams that created and operated M3. Previously, he worked at AWS, Microsoft, and Google. He and his family are based in the Seattle area, and he enjoys playing soccer and eating meat pies in his spare time.

A man in a suit is making a funny face while embracing a cloud-native mindset.
Corey Quinn

Chief Cloud Economist
The Duckbill Group

Corey is the Chief Cloud Economist at The Duckbill Group. Corey’s unique brand of snark combines with a deep understanding of AWS’s offerings, unlocking a level of insight that’s both penetrating and hilarious. He lives in San Francisco with his spouse and daughters.

Overview

Life is never dull when you sit down for a chat with Last week in AWS newsletter publisher, Screaming in the Cloud podcast host, and Chief Cloud Economist at The Duckbill Group, Corey Quinn. Our co-founder and CEO, Martin Mao, would know since he recently spent a few minutes on Corey’s podcast, convincing him that observability is more than just “hipster monitoring”.

Listen to the full podcast to hear their lively discussion on the difference between observability and monitoring, Martin’s and Chronosphere’s origins, and the power of open source technology. In the meantime, here are some highlights:

On Martin’s most recent gig prior to being Chronosphere’s CEO

CQ: I’m always a big sucker for origin stories. Tell me a little bit about that. You’ve hit the big three cloud providers at this point. What was that like?

MM: I joined Uber in 2015 to lead a core part of their monitoring team [with Chronosphere co-founder and CTO Rob Skillington] and eventually a larger observability team. That team went on to build open-source projects such as M3 and other projects, such as Jaeger for distributed tracing and a logging backend system. I spent many years there building out their observability stack.

On launching Chronosphere

CQ: What made you decide that, all right, launching Chronosphere is something I’m going to pursue?

MM: I always got a lot of joy building large distributed systems, handling lots of load, and solving problems at a really grand scale. The reason for doing a startup was the situation that we were in [at Uber]. The trend right now is going from cloud to cloud-native, going from VMs to containers on the infrastructure tier, and going from monoliths to microservices. It’s not the growth of the company, necessarily, or the growth of the load that the system has to handle, but this shift to containers and microservices that heavily accelerates the growth of the amount of metrics data that gets produced, and that is causing a lot of these problems.

In leading the core part of the observability team at Uber with Rob, we were lucky to solve the observability problem—not just for Uber, but for the broader community, especially the community adopting cloud-native architecture. We were solving the problem for Uber in 2015, but the rest of the industry has similar problems today. It was the perfect opportunity to solve observability for a broader range of companies out there. And we already had a lot of the core technology built in open-source.

Martin goes deeper into Chronosphere’s origin story in his two-year anniversary blog, Happy second birthday Chronosphere! Also, our head of people writes about how Chronosphere became a “remote-first” company in her blog, What it takes to grow a remote-first startup during a pandemic.

On explaining the definition of observability to a cynical Corey Quinn

CQ: Talk to me a little bit more about what observability is. I hear people talking about it in a bunch of ways … What is it?

MM: Originally we thought that observability is a combination of metrics, logs, and traces, and that’s a very common view— the three pillars. It’s almost like three checkboxes – you tick them off, and you have “observability.” That’s actually how we looked at the problem at Uber, and we built solutions for each one of those and we checked all three boxes.

Since then we’ve realized just having all three boxes checked doesn’t help you with the ultimate goal of what you want from an observability platform. Our view on observability is from an end-user perspective, rather than a data-type or data-input perspective: You want to be notified of issues and remediate them as quickly as possible. That comes down to answering three questions:

  1. How quickly do I get notified when something is wrong? Is it BEFORE a user/customer has a bad experience?
  2. How easily and quickly can I triage the problem and understand its impact?
  3. How do I find the underlying cause so I can fix the problem?

Martin talks more about the definition of observability in his New Stack article, Beyond the 3 Pillars of Observability.

On how cloud-native has changed the monitoring and observability market

CQ: What was your perspective that made you look around the fairly crowded landscape of observability companies’ tools and say, “You know, no one’s quite gotten this right yet. I have a better idea.”

MM: In the previous environments that companies were operating in, there were a lot of different tools for different purposes. A company would purchase an infrastructure monitoring tool, or perhaps a network monitoring tool, and then they would have, perhaps, an APM (application performance monitoring) solution for applications, and then perhaps BI (business intelligence) tools for the business. There was always, historically, a collection of different tools to go and solve this problem.

With the shift to cloud-native, there is a need to have all metrics data and visibility in a single tool. Also, none of the existing monitoring tools today were built for a cloud-native environment. You can think about the time when these companies were created – back in the early 2010s, Kubernetes and containers weren’t really a thing. So, a lot of these tools weren’t built for the modern architecture that we see most companies shifting towards.

The opportunity was to build something for where we think the industry and everyone’s technology stack was going to be, as opposed to where the technology stack has been.

Tune in

In their half hour sit down, Corey and Martin cover many more topics, such as:

  • How Chronosphere helps companies before they get to the hyperscale stage.
  • What sets Chronosphere apart from other observability solutions?
  • What to do if you’re outgrowing Prometheus?