The Future of Observability: the rise of cloud native

on September 6th 2022

In this episode of the Future of Observability video series, Chronosphere co-founder and CEO Martin Mao shares his insights with Technical Writer Chris Ward about the impacts of cloud native adoption throughout an organization.

Chris: Cloud native as an architecture has a lot of advantages, but it also means a lot more infrastructure, service data, and telemetry data to manage. How do you see this new influx of information impacting those companies who are holding onto legacy solutions or are in the process of trying to move into a fully cloud native approach?

Martin: To answer, you really have to look at why companies are doing this at all.

At a technical level, cloud native enables enterprises to go multi-, public-, or hybrid-cloud. Taking a step back from that, it’s not just the technology stack. It’s like the whole DevOps mentality—why did companies adopt DevOps? There clearly are advantages to it. If you look at the companies that adopted DevOps early on—the tech giants of the world—they used the DevOps mentality mentality to react, in the services and products they provide, at a much faster pace for their customers. Clearly there are big advantages around why DevOps is so popular—it’s not just the latest fad.

Parallels with DevOps 

From that perspective, before we even get to the telemetry side of things, you see companies are making this shift and are realizing cloud native is actually quite a big movement because it’s not just a change of the technology stack. It’s an adoption of a new mentality toward a software development mentality, which is the DevOps mentality. Often this means re-skilling your workforce, especially if you’re a traditional large enterprise—you don’t have a workforce of folks who grew up in this DevOps/SRE model, but they have to quickly adapt to it. 

The enterprises themselves need to organize in a slightly different way to take advantage of a cloud native architecture. And then of course the technology stack itself is fairly different. If you look at all of that, these are big changes required across the technology stack, the organization, the skill sets…yet companies are still willing to adopt cloud native because they see the significant advantage. 

On observability vs APM  and monitoring tools

If you look at telemetry, in particular, and observability, this is the part of the system that tells you whether you are doing this migration so that your products and services can be more effective for your customers. Telemetry data is critical—not just for migration, but as part of the larger solution moving forward.

That changes the way companies think about observability compared with how they thought about APM and IT monitoring tools. You begin to look at observability as a critical piece of the business, and I think that importance is better understood by companies as they shift. Now I think what isn’t as clear when you do this shift is what new challenges observability brings compared with your legacy APM or IT monitoring tools.

As you look at these new technical architectures, a lot of the requirements have changed. A lot more data is produced because infrastructures are ephemeral. On the telemetry and the observability side, there are challenges that are hard to solve by simply repurposing an existing APM tool.

We need to look at the problem holistically from a bottoms-up perspective, and build out the ultimate solution for this new type of environment. This isn’t just true for telemetry data and observability. You can imagine how this new architecture forces enterprises to rethink their security spaces, not to mention CI/CD and deployment. 

On the need for reliability

There are particular pain points in terms of the growing amount of the volume of data that gets produced that you really have to keep an eye on and handle. Things like how reliable you need these systems to be, because the requirements have changed. You used to use observability primarily to just look at your infrastructure.

Now, as you’re trying to improve the level of service you provide for your customers, you can imagine increasing your SLAs (service level agreements) for customers, which means you have to increase SLAs for your own internal infrastructure. The reliability of these systems has to be a lot higher than before. It’s a brand new world—a brand new tool at the very least.

Are you getting value from your observability platform

Chris: This new approach gives you much more information you have to handle, but then information brings you more insights that you can react to — even if those insights may seem overwhelming.

Martin: Exactly. The downside of all that observability is how do I even store 100x the amount of insights because I’m not willing to pay 100x the cost.

Clearly there’s more value there but how are companies able to trade off on the return on that investment? I think a lot of companies are really struggling with that right now in this space.

In the upcoming weeks, you can expect videos on new topics ranging from high cardinality, the three phases of observability, to the future of PromQL with Julius Volz. Make sure to subscribe to the Chronosphere YouTube channel so you don’t miss any future videos.

Interested in what we are building?