TLDR; View cost and usage information side-by-side to help determine which metrics you want to keep, drop, or add to your system.
A central observability team (COT) manager wants their system to be valuable to developers and SREs, easy to navigate, and efficient even as their data grows and changes.They often end up with a long list of high-volume metrics currently running through their system and wonder which ones are useful and which ones are clutter.
To solve this problem, the COT manager looks at a few dashboards and asks around in different teams. Some metrics seem to be in a lot of dashboards, but it’s hard to get the same answer across multiple teams about their value. What is useful to one team is not useful to another.
In another case, individual developers across many teams do mention the same specific metrics, but the COT notices that they aren’t used in any dashboards or monitors — should they be? If so, why aren’t they in those dashboards today? Who last opened that one related dashboard — maybe they will know?
Everyone tells the COT that they can’t live without all of their kube-state-metrics, but those represent a ton of volume in the system! How can the COT simplify the data that their teams use? And do their teams really need all of those instance labels on the ksm output?
Additionally, how much should the COT trust someone who says “Oh we never use that one metric, you can let it go.” Then, too often, next week that same person turns around to ask where it went because it is the one useful metric they need to solve a high-priority issue!
Metrics Usage Analyzer is a new way for COT managers and team leads to easily view cost and usage information side-by-side to make smarter metric traffic shaping decisions that reduce noise and boost signal.
Who is using this data and what do they use it for?
Observability system managers need information about data volume (data points per second) and data usage (who uses the data, where, and for what purpose). This context lets them make confident decisions about what high-signal data should be kept, publicized, and invested in, and what data can be rolled away, dropped, or ignored.
The Usage Analyzer allows users to view all of their metrics in Chronosphere ranked from least-used to most-used, alongside information about each metric’s volume (as measured in data points per second) and metric cardinality.
Users can quickly sort the list to find un- or little-used metrics that they can drop or roll away. They can also identify what metrics users look for with Chronosphere’s Metric Explorer (a tool that helps users investigate query requests, responses, and query statistics), even if those metrics aren’t in any dashboards or monitors. They can then decide whether to add those metrics into dashboards and monitors to ensure that they are more widely used.
Defining data usage
There are multiple reasons metrics are highly utilized. For instance, they may be in dashboard or monitor configurations. Users may access dashboards and/or monitors frequently or infrequently. The metric may not be in any dashboard, but it may be something that users actively search for as part of a debugging workflow. It may be an important component of a shaping rule, such as an input to a rollup rule or a recording rule. Finally, a large volume of knowledgeable developers, team leads or SRES may use the metric.
The Metrics Usage Analyzer captures all of this information about every metric. It tells you where the metric is being used, how frequently, and within what service or by which users.
It captures this data on a longer time-frame as well due to the cyclical nature of metrics usage — you can select either a 14-day (sprintly) usage data window, or a 30-day (monthly) usage data window, to capture metrics that are only used during release cycles and ensure you don’t accidentally underestimate the utility of metrics that are only occasionally useful.
Questions that Metrics Usage Analyzer can answer
The Usage Analyzer can help COT managers and team leads with a variety of different workflows. It can help address the following questions:
What don’t my developers need?
In the Usage Analyzer, you can sort metrics and identify ones that are unused, or infrequently used, and see the volume they represent in your system — and decide if you want to let them go.
What aren’t my developers using, that they should be using?
As you browse through the Usage Analyzer, you may see metrics with unusual usage patterns. For instance, a metric may see frequent queries, but they’re all from a single user. Why? Perhaps that metric is new and valuable and other teams don’t know that they can use it yet. Here, you can quickly reach out to all of your team leads and let them know they can take advantage of this new metric!
Or, perhaps a user doesn’t know that a different dashboard exists with an aggregated version of a metric, that is more performant than the current query they use. You can direct them to the new dashboard and then, if no one else is using the original metric, go ahead and let it go so that you don’t have to store the less-relevant data.
What do people need that I’m not providing?
The Usage Analyzer’s Metrics Explorer information also shows metrics that your teams are looking for, but that have no presence in your system. These may be metrics that users would love to have! You can take this data and reach out to users to offer alternative metrics or let them know what is still in the system.
The Metrics Usage Analyzer gives Chronosphere users full insight into the metrics data usage whether by dashboard, monitor, shaping rule, or query execution. This insight means users can more confidently identify unused data, make decisions about their metrics data shape, and communicate to users the impact of a proposed shaping rule.