Life is never dull when you sit down for a chat with Last week in AWS newsletter publisher, Screaming in the Cloud podcast host, and Chief Cloud Economist at The Duckbill Group, Corey Quinn. Our co-founder and CEO, Martin Mao, would know since he recently spent a few minutes on Corey’s podcast, convincing him that observability is more than just “hipster monitoring”.
Listen to the full podcast to hear their lively discussion on the difference between observability and monitoring, Martin’s and Chronosphere’s origins, and the power of open source technology. In the meantime, here are some highlights:
On Martin’s most recent gig prior to being Chronosphere’s CEO
CQ: I’m always a big sucker for origin stories. Tell me a little bit about that. You’ve hit the big three cloud providers at this point. What was that like?
MM: I joined Uber in 2015 to lead a core part of their monitoring team [with Chronosphere co-founder and CTO Rob Skillington] and eventually a larger observability team. That team went on to build open-source projects such as M3 and other projects, such as Jaeger for distributed tracing and a logging backend system. I spent many years there building out their observability stack.
On launching Chronosphere
CQ: What made you decide that, all right, launching Chronosphere is something I’m going to pursue?
MM: I always got a lot of joy building large distributed systems, handling lots of load, and solving problems at a really grand scale. The reason for doing a startup was the situation that we were in [at Uber]. The trend right now is going from cloud to cloud-native, going from VMs to containers on the infrastructure tier, and going from monoliths to microservices. It’s not the growth of the company, necessarily, or the growth of the load that the system has to handle, but this shift to containers and microservices that heavily accelerates the growth of the amount of metrics data that gets produced, and that is causing a lot of these problems.
In leading the core part of the observability team at Uber with Rob, we were lucky to solve the observability problem—not just for Uber, but for the broader community, especially the community adopting cloud-native architecture. We were solving the problem for Uber in 2015, but the rest of the industry has similar problems today. It was the perfect opportunity to solve observability for a broader range of companies out there. And we already had a lot of the core technology built in open-source.
Martin goes deeper into Chronosphere’s origin story in his two-year anniversary blog, Happy second birthday Chronosphere! Also, our head of people writes about how Chronosphere became a “remote-first” company in her blog, What it takes to grow a remote-first startup during a pandemic.
On explaining the definition of observability to a cynical Corey Quinn
CQ: Talk to me a little bit more about what observability is. I hear people talking about it in a bunch of ways … What is it?
MM: Originally we thought that observability is a combination of metrics, logs, and traces, and that’s a very common view— the three pillars. It’s almost like three checkboxes – you tick them off, and you have “observability.” That’s actually how we looked at the problem at Uber, and we built solutions for each one of those and we checked all three boxes.
Since then we’ve realized just having all three boxes checked doesn’t help you with the ultimate goal of what you want from an observability platform. Our view on observability is from an end-user perspective, rather than a data-type or data-input perspective: You want to be notified of issues and remediate them as quickly as possible. That comes down to answering three questions:
- How quickly do I get notified when something is wrong? Is it BEFORE a user/customer has a bad experience?
- How easily and quickly can I triage the problem and understand its impact?
- How do I find the underlying cause so I can fix the problem?
Martin talks more about the definition of observability in his New Stack article, Beyond the 3 Pillars of Observability.
On how cloud-native has changed the monitoring and observability market
CQ: What was your perspective that made you look around the fairly crowded landscape of observability companies’ tools and say, “You know, no one’s quite gotten this right yet. I have a better idea.”
MM: In the previous environments that companies were operating in, there were a lot of different tools for different purposes. A company would purchase an infrastructure monitoring tool, or perhaps a network monitoring tool, and then they would have, perhaps, an APM (application performance monitoring) solution for applications, and then perhaps BI (business intelligence) tools for the business. There was always, historically, a collection of different tools to go and solve this problem.
With the shift to cloud-native, there is a need to have all metrics data and visibility in a single tool. Also, none of the existing monitoring tools today were built for a cloud-native environment. You can think about the time when these companies were created – back in the early 2010s, Kubernetes and containers weren’t really a thing. So, a lot of these tools weren’t built for the modern architecture that we see most companies shifting towards.
The opportunity was to build something for where we think the industry and everyone’s technology stack was going to be, as opposed to where the technology stack has been.
In their half hour sit down, Corey and Martin cover many more topics, such as:
- How Chronosphere helps companies before they get to the hyperscale stage.
- What sets Chronosphere apart from other observability solutions?
- What to do if you’re outgrowing Prometheus?
Tune to hear their entire conversation.