Discover what high cardinality in observability is, why high cardinality is a problem, and 3 ways to tame data growth and cardinality.
Rob Skillington is the Co-Founder and CTO of Chronosphere. He was previously at Uber, where he was the technical lead of the observability team and creator of M3DB, the time-series database at the core of M3. He has worked in both very large (Microsoft), medium (Uber) and small teams. As of 2023, Rob is based in Melbourne, Australia with his family of four after 10 years spent living in Seattle, San Francisco, and New York City.
On: Feb 24, 2024
With the transition from monolith to cloud-native environments, we are seeing an ongoing explosion of metrics data in terms of both volume and cardinality. This is because microservices and containerized applications generate metrics data an order of magnitude more than legacy environments. To achieve good observability in a cloud-native system, you will need to deal with large-scale data and take steps to understand and control cardinality.
This blog explains what high cardinality in observability is, why high cardinality is a problem, and explains three ways to tame data growth and cardinality.
Cardinality is the number of possible groupings depending on the dimensions the metrics have. Dimensions are the different properties of your data.
When we talk about metric cardinality, we mean the number of unique time series that are produced by a combination of metric names and their associated labels [dimensions]. The total number of combinations with data that exist are cardinalities. The more combinations there are, the greater a metric’s cardinality is.
To get a sense of how quickly high cardinality explodes the scale of telemetry data, compare a legacy environment to a cloud-native environment and watch how quickly you end up going from 150,000 possible unique time series to 150 million!
You might be naturally inclined to ask at this point – what constitutes high cardinality for metrics? It turns out the answer here is somewhat relative. In the legacy environment we mentioned above, we saw that we could easily create a metric with 150,000 unique values, with individual dimensions having no more than a few hundred values. In comparison when looking at a cloud-native environment, it’s not unreasonable for us to see individual dimensions that have thousands of unique values (or more). As we saw, we were able to easily generate over 100 million unique series for a single metric!
There is a tradeoff that takes place when more dimensions are added to metrics and cardinality goes up. The more important question to ask ourselves becomes: Is there an acceptable ROI for the dimensions we add to our metrics and the value provided by the additional cardinality?
To strike a balance between value and cardinality, we can classify our metrics and the dimensions they have into categories to help us think about the inherent tradeoffs.
It’s understandable how metrics data growth gets out of hand. There’s a lot of power in the extra level of granularity you get with the expansion of unique time series you’re storing.
The key is understanding how high cardinality impacts the scale of the telemetry data you need to collect and finding ways to control metrics data. Here are three ways to solve the high cardinality problem:
Monitoring and observability enable your company to operate without massive outages or huge levels of impact when you do have an outage. Your high-level of reliability is the thing that keeps your system dependable, and it’s the reason why people keep using your service. To achieve great observability, keep these three approaches to controlling high cardinality data in mind.
Chronosphere’s observability platform is the only purpose-built SaaS solution for scaling cloud-native environments. Chronosphere puts you back in control by taming rampant metric data growth and the high cardinality problem. Chronosphere allows customers to keep pace with the massive amounts of monitoring data generated by microservices, and it does so with more cost efficiency than legacy solutions, with tools like the Control Plane.
Get started with a customized walkthrough!
John Potocny contributed to this article.
Request a demo for an in depth walk through of the platform!