Sophie Kohler: I think it’s time to talk about costs. I know it’s on everyone’s minds right now, and in economic downtime, it’s impossible not to feel the effects of high costs and squeezed budgets. For organizations looking to successfully compete, all eyes are on what’s racking up the big bills: data. What’s changed that’s creating this need to control data?
The move from VMs to containers has created an explosion of growth. Cloud native environments actually emit between 10 times and 100 times more observability data than traditional, VM-based environments. This brings up two challenges. One, observability data costs skyrocket and become a big budgeting item. Two, engineers are drowning in a flood of noise, when trying to perform simple tasks, like troubleshooting.
Which means longer incidents, which means impacted customers, which means higher costs with worse outcomes. It even ends up affecting your return on investment. “Thanks for telling us what we’re dealing with here, Sophie. Do you have any solutions?” Okay, okay. I know a few folks who do. And get this, it follows a three step process. But before we dive into what those three steps are, let’s talk about what it all starts with. Step zero, if you will: centralized governance.
Setting a foundation with centralized governance
Since observability data costs are becoming both costly and unpredictable, observability teams need to lay one crucial foundation: the ability to understand their organization’s usage – by team or service or whatever dimension is important. This allows for chargebacks or showbacks. And once observability teams have visibility into consumption, they can take steps to make it more predictable. Like allocating portions of the observability system capacity with guardrails, like quotas.
And now that we’ve actually cultivated an environment for allocating priorities and driving accountability, let’s start with step one of the process: Analyze. Actually, let’s go talk to Scott Kelly to hear what he has to say about it. Scott, are you there?
Analyzing incoming data in real time
Scott Kelly: Yeah, happy to help. So before taking action on their data, engineering teams must be able to analyze the incoming data in real time to address issues such as cardinality spikes that could currently be impacting the observability system. And they also need the ability to understand both the cost and the utility of the data. Just knowing if the data is used or not really isn’t enough.
To understand the cost, Chronosphere analyzes all of the incoming data with our traffic analyzer, to show teams information about the cardinality and volume of the various data sets in real time, and historically. To understand the value, we provide a metrics usage analyzer that allows teams to see the cost and value of their metrics side by side. The metrics usage analyzer lets teams slice and dice usage by different dimensions – such as least valuable, most valuable, most used, et cetera, for fine grained cost usage analysis. It provides detailed information about where their metric is used, how the metric is used, how often the metric is used, and who’s using the metric. We even generate a utility score from this analysis for each metric to make it easy to understand how useful it is.
Sophie Kohler: Interesting, interesting. So, once you’ve identified the data, what can you do about it? You need strong refining capabilities. Julia, can you help us out?
Refining data with various tools on the write and read path
Julia Blase: Hi, Sophie. Sure. In order to refine your data, you might need a few different tools. You’ll want a few tools on both the write path as data comes into your system, and also on the read path as data comes out of your storage system. On the write path, I think some of those classic tools are tools that let you drop metrics that are not being used at all today, things that you just don’t need and don’t want to pay for.
Then, on the read side, you might have complex ways to query data that give you different relevant insights into your system. And you may want to store those under something like a new metric name. Something that in Chronosphere, we called a derived metric name. [ It] makes it really easy for people to find and use even if they’re brand new to your system.
Sophie Kohler: Because of the dynamic nature of [the] cloud and its agility, the value of that data may change. Kim, are there rules that we still need to put in place?
Adjust for efficiency and maximize your return on investment
Kim Oehmichen: That’s a great question, Sophie. So for your large scale cloud native environment, you and your teams need insights into how effective your refining policies are so that you can make adjustments as necessary. Without this level of transparency, your teams cannot ensure that they’re using their assigned capacity or quota efficiently – which results in your organization not being able to maximize the return of investment of your observability practices. As TAMs (Technical Account Managers), we also help you to identify and put the right policies and configurations in place that work best for your organization and scale.
In other words, we help you to continuously adjust for efficiency, troubleshooting faster by ensuring that the data your engineers need to manipulate during your debugging process are quick and responsive.
Let’s have a closer look at the query accelerator. It continuously scans for slower queries and automatically creates a refined alternative. What I think is really cool is that they can simply create a query that returns the data they need, and our query accelerator will ensure that it performs optimally wherever it is used.
Sophie Kohler: That is very cool indeed. Now, with observability costs and data growth finally under control, your organization can actually do more with the time and savings, and make their observability a competitive advantage rather than a cost center.
Analyze, refine, operate. Thanks for listening and catch you next time, where we can talk more about how you can set your team up for success in a cloud native world.