Unintended Consequences of Adopting Cloud Native
There are a lot of questions surrounding observability. Three that immediately come to mind: What should I look for in an observability solution? What benefits will my organization get from observability? Heck, is observability even a noun or a verb?
I recently sat down with guest Forrester Principal Analyst Carlos Casanova to answer these questions. In the recent webinar, “Observability Today, Tomorrow, and the Future: Cutting Through the Noise,” we break down “What is cloud native observability?”and share reasons why observability can lead to better business outcomes.
As a quick preview of our conversation, I’ve summarized a bit of our discussion below:
We began by setting up the observability landscape and describing some of the unintended consequences of moving to cloud native. There’s a reason that so many companies are adopting cloud native – it’s become a requirement to keep pace in today’s high-speed and high-stakes competitive landscape.
While the advantages of cloud native like agility and speed are very real, they come with some unexpected challenges. For example, observability data — meaning the telemetry you need to monitor and observe your system, such as logs, metrics, and distributed traces — is increasing at a faster rate than production of business data. As we move to cloud and then cloud native— microservices and containers—the volume of data starts to increase exponentially.
Need an example of how moving to a cloud native observability solution can help? Abnormal Security, a fast-growing SaaS email security startup, was able to improve the reliability and stability of its metrics system with Chronosphere observability. Abnormal has flattened large spikes in metric volume and improved overall stability to greater than 99.9% uptime.
This trend might be easier to rationalize if more telemetry led to better outcomes — most of us would be willing to endure the pain of having additional data if it meant faster mean time to resolution (MTTR) or fewer critical incidents.
Instead, companies are getting more observability data than they can effectively use. Their observability solutions are getting more expensive, but companies are not getting increased value. In many cases value is actively declining. A cloud native observability solution can help solve this data-growth challenge.
But What is Observability?
Sometimes the first step to fixing a problem is to step back and re-evaluate what we’re trying to achieve. What do we mean when we say “observability”? When I asked Casanova this elusive question, he had a long-ish response, which isn’t surprising given that the definition of observability has evolved with the complexity of today’s cloud native architectures. In particular, some legacy monitoring vendors are inserting themselves into the observability conversation, which creates confusion.
Following are snippets from Casanova’s answer, which he admits is tough to reduce to two sentences. You can watch the video for a more complete recap starting at around 14:30.
Casanova: During my inquiries and briefings with vendors and organizations that are trying to implement observability, it was obvious that there was some confusion around what the term was … But if we look at how the DevOps community has defined observability, it is based on exploring properties and patterns not defined in advance.
Casanova adds that, “observability is defined by the inherent ability of an entity to share information, to allow it to be explored, to allow it to be analyzed.”
My answer: Plucking from my recent blog, “Are the three pillars of observability still relevant”: Observability is both a practice (or process) and describes the property (or state) of a service. Like DevOps, observability is a core competency of distributed systems engineering. It is the practice that cloud native developers do on a daily basis in increasingly complex systems. Observability is also a property of a system — whether or not it produces data that can be used to answer any question that a developer asks of it.
Rather than focus on observability in terms of the three pillars — logs, metrics, and traces — engineering and SRE leaders in cloud native environments should think about the three phases:
- Know: Recognize there’s a problem
- Triage: Stop the problem from creating additional negative outcomes
- Understand: Dive into the root cause of the problem
Why? Because the three phases of observability answer critical questions about operating the code and systems they’ve built.
What Should You Look for in an Observability Solution?
After establishing the definition of observability, Casanova ticked off five questions to ask before adopting an observability solution. Below is his top-five list, but you can listen in around 34:00 to hear him delve into each point. An observability platform should offer:
- Ability to control data growth: Is the solution able to control observability data growth?
- Reliability: Has the solution proven it is reliable and delivered the performance for modern-day operations?
- Avoid vendor lock-in: What risk is there regarding vendor lock-in?
- Ease of use: How easy is it for developers to use and program?
- Scalability: Can the solution scale with your organization’s growth plans?
Can Observability Drive Competitive Advantage?
How does cloud native observability tie directly with positive outcomes for modern businesses? Casanova handed this question back over to me since this is an area I’m so passionate about, and our company has deep roots in the observability space.
As a company, Chronosphere’s mission is to guide modern businesses to leverage observability as an essential competitive advantage. Tune into ~39:00 to catch my quick recap of Chronosphere’s origin story and our co-founders’ history running observability at Uber.
Casanova agreed and added that cloud native initiatives require observability to be successful because of the many pitfalls that come along with the increased complexity and speed in cloud native environments. In addition to that, he noted that a recent Forrester survey uncovered that businesses who instrument their infrastructure, applications and business services to measure key performance indicators (KPIs) were more likely to have had positive revenue growth in 2021.
That’s my recap! There’s a lot more to hear so tune into the entire webinar here.