Sharing ideas and experiences about cloud-native, open-source monitoring with the developer community is a favorite pastime of our CEO, Martin Mao. Before he co-founded Chronosphere with our CTO, Rob Skillington, Martin spent much of his career running monitoring and observability teams for large born-in-the-cloud companies. He spent several years solving observability problems for Uber while also working with the open source community.
As the opening keynote speaker at last week’s FluentCon at KubeCon + CloudNativeCon EU 2021, Martin was in his element when he sat down for a virtual fireside chat with Fluent maintainer, Anurag Gupta. This was the inaugural event for FluentCon – which was co-located with KubeCon EU and is described as a cloud-native logging day with Fluent Bit and Fluentd. (Chronosphere’s Chris Ward will round up highlights from KubeCon EU later this week, so there’s more to come on the latest DevOps innovations rolled out at that event.)
Chronosphere’s view on observability
During the fireside chat, Anurag and Martin talked about all things open source, logging, and they delved into the broader Fluent Ecosystem (happy 10th birthday Fluentd!) But the best part of the morning session, of course, was their discussion about how users can achieve great observability.
Martin started by sharing Chronosphere’s mission to provide a hosted monitoring solution to companies adopting cloud-native. We are passionate about helping these companies monitor their applications, which are generally microservices-oriented—and monitor their infrastructure—which is primarily kubernetes these days.
Optimizing for outcome: The three phases of observability
Once intros concluded, Anurag went to his first question about a topic that is near and dear to Chronosphere: “What should users be thinking about when solving for observability?”
Martin’s answer: He started out by discussing how Chronosphere sees observability as more than a practice. It’s a culture or mindset with roots in DevOps. The end users of observability are developers, and today they are trying to optimize for one outcome: to know when something is wrong and to remediate that issue as quickly as possible – ideally before end customers find out.
Martin noted that optimizing for an outcome means answering the following questions as you work through the three phases of observability:
- Phase 1: Know about the problem. Can I get notified when something is wrong?
- If customers find out before you, that’s not a great place to be. You need to be the first to know when there’s an issue.
- Phase 2: Triage the problem. How can I easily find out the impact and scope of the issue?
- Once you get notified something is wrong, it’s critical to know immediately what the impact is. Are all customers impacted or just a subset? Is it one cluster or multiple? Answering these questions is critical for remediation.
- Phase 3: Understand the problem. Can I figure out the underlying cause to fix the problem?
- Knowing the root cause of a problem is key to fixing it and then preventing it from happening again. Ideally this occurs after you restore services, when engineers can take the time to locate and understand underlying issues without pressure of the ticking clock of customer expectations.
Where do metrics, traces, and logs fit into observability?
They also talked about the difference between the three phases of observability that Chronosphere has identified vs. the legacy “three pillars” concept. Martin then answered Anurag’s next question, “Do the data types or three pillars not matter as much?”
Martin’s answer: At Chronosphere, we don’t think about the data types – logs, metrics, and traces – when we define observability and what end users should focus on. Yes, there are older observability definitions – such as the so-called three pillars – that are concentrated around those data types. And yes, those data types are important, but we use them to arrive at an outcome. The data types by themselves don’t give you observability.
Breaking down data’s role in observability, Martin explained:
- Data is a means to an end.
- More data isn’t better data.
- You don’t need to instrument all three data types just for the sake of checking all three boxes. Focus on the value that you will derive from each data set and determine if it makes sense to put in the effort to instrument it or not.
The future of observability
Martin clearly enjoyed Anurag’s final, future-looking question. Pointing out that observability has changed significantly in the past three years with new projects, new protocols, and new standards, Anurag asked, “As someone who is at the forefront of this, what do you think the next three years look like? What is the future of observability in Martin Maos’ mind?”
Martin’s answer: The future is always hard to predict, but let’s have a crack at it. We see four key trends emerging:
- Every developer will adopt an observability mindset and there will be a huge transfer of both knowledge and skill set from that core SRE team, from the experts in these practices today, to all developers everywhere.
- Increased focus on observability outcomes, which means remediation as quickly as possible. There will also be an easier flow between the phases and the underlying data between those phases.
- Better integration between the infrastructure and application tier. The end of separate and siloed tooling for application performance monitoring and infrastructure monitoring is fast approaching.
- The reliability of observability will become more important as the amount of observability data increases. Customers will demand that their SaaS vendors improve SLAs as observability becomes a mission-critical application in and of itself.
Martin’s fireside chat with Anurag was wide and ranging, and they covered a lot in less than thirty minutes. I encourage you to grab a cup of coffee and sit down to listen in on the entire fireside conversation by watching the video below.