The Observability Data Optimization Cycle can help organizations get better control of their data and simultaneously reduce overall costs. This video recap outlines the different stages and how organizations can benefit from the framework.
On: Jan 16, 2024
The Observability Data Optimization Cycle is a framework that teams can use to support the best possible observability outcomes at scale, while keeping costs under control.
The cycle includes steps for organizations to govern, analyze, refine, and operate their data so that development and engineering teams can get the data they need and use it to troubleshoot and configure cloud native environments.
If you don’t have time to sit down for the full video and learn about each step of the cycle, check out the transcript below.
Sophie: I think it’s time to talk about costs. For organizations looking to successfully compete, all eyes are on what’s racking up the big bills: Data.
What’s changed that’s creating this need to control data? The move from VMs to containers has created an explosion of growth. Cloud native environments actually emit between 10 times and 100 times more observability data than traditional, VM-based environments. This brings up two challenges:
Which means longer incidents…impacted customers…[and] higher costs, with worse outcomes. It even ends up affecting your return on investment.
Engineer: Thanks for telling us what we’re dealing with here, Sophie. Do you have any solutions?
Sophie: Okay, I know a few folks who do. And get this, it follows a three step process. But before we dive into what those three steps are, let’s talk about what it all starts with. Step zero, if you will: Centralized governance.
Observability teams need to lay one crucial foundation: The ability to understand their organization’s usage. This allows for chargebacks or showbacks. And once observability teams have visibility into consumption, they can take steps to make it more predictable – like allocating portions of the observability system capacity with guardrails, like quotas.
Sophie: Let’s start with step one of the process: Analyze. Actually, let’s go talk to Scott Kelly, to hear what he has to say about it. Scott, are you there?
Scott Kelly: So, before taking action on their data, engineering teams must be able to analyze the incoming data in real-time to address issues – such as cardinality spikes that could currently be impacting the observability system. And they also need the ability to understand both the cost and utility of the data. Just knowing if the data is used or not really isn’t enough.
To understand the cost, Chronosphere analyzes all of the incoming data with our Traffic Analyzer, to show teams information about the cardinality and volume of the various data sets in real time and historically. To understand the value, we provide a Metrics Usage Analyzer, which allows teams to see the cost and value of their metrics side by side.
Sophie: Interesting, interesting. So, once you’ve identified the data, what can you do about it? You need strong refining capabilities. Julia, can you help us out?
Julia Blase: So, in order to refine your data, you might need a few different tools. You’ll want a few tools on both the write path as data comes into your system and also on the read path, as data comes out of your storage system.
On the write path, I think some of those classic tools are tools that let you drop metrics that are not being used at all today. Things that you just don’t need and don’t want to pay for.
Then, on the read side, you might have complex ways to query data that give you different relevant insights into your system. And you may want to store those under something like a new metric name – something that in Chronosphere, we call a derived metric name. It makes it really easy for people to find and use, even if they’re brand new to your system.
Sophie: Kim, are there rules that we still need to put in place?
Kim Oehmichen: That’s a great question, Sophie. For your large scale cloud native environment, you and your teams need insights into how effective your refining policies are, so that you can make adjustments as necessary. Without this level of transparency, your teams cannot ensure that they’re using their assigned capacity or quota efficiently which results in your organization not being able to maximize the return of investment of your observability practices.
As technical account managers (TAMs), we help you to identify and put the right policies and configurations in place that work best for your organization and scale. In other words, we help you to continuously adjust for efficiency, troubleshooting faster by ensuring that the data your engineers need to manipulate during your debugging process are quick and responsive.
Let’s have a closer look at the Query Accelerator. It continuously scans for slower queries and automatically creates a refined alternative. What I think is really cool is that they can simply create a query that returns the data they need, and our query accelerator will ensure that it performs optimally wherever it is used.
Sophie: That is very cool indeed. Actually, did you know that almost 69% of companies recently surveyed by an ESG study said that they’re concerned with their observability data growth? Now, with observability costs and data growth finally under control, your organization can actually do more with the time and savings, and make their observability a competitive advantage, rather than a cost center.
Analyze, refine, operate: The three steps to help your organization fight the mighty data battle and actually win. Thanks for listening, and catch you next time where we can talk more about how you can set your team up for success in a cloud native world.