We talk a lot about high cardinality data in the observability world, and the importance of having access to high cardinality data (and also the dangers of too much cardinality). But what do we mean when we talk about data cardinality? To start, we can look at the definition of cardinality; cardinality is defined as *the number of elements in a set or other grouping. *To make it a little clearer what that means, let’s walk through an example.

## An example of cardinality growth

Say we want to keep track of how many cars pass by on a street; to do so, we can simply keep a running count with each new one that goes by. If we express our count of passing cars as a metric, it might look something like this:

passed_cars_total{}: 10 |

We’re only tracking a single number at this point, the number of cars that have passed us – that means our metric has a cardinality of 1. What if we wanted a little more information about the cars going by though? We could break it down by the type of vehicle, such as sedan, van, SUV, or truck. Then our metric might look like this:

passed_cars_total{vehicle_type=”sedan”}: 4 |

passed_cars_total{vehicle_type=”van”}: 1 |

passed_cars_total{vehicle_type=”suv”}: 3 |

passed_cars_total{vehicle_type=”truck”}: 2 |

By splitting our count out by **vehicle_type**, we’ve added another dimension to our data, and increased the overall cardinality – now we’re tracking 4 values, so the cardinality of our metric is 4. The dimension we just added has 4 possible values, so the cardinality of **vehicle_type** is also 4.

We can continue to add more dimensions to our count of passing cars. Let’s also include whether we see a dog in the car! That would make our metric look something like this:

passed_cars_total{vehicle_type=”sedan”, has_dog=”true”}: 2 | passed_cars_total{vehicle_type=”sedan”, has_dog=”false”}: 2 |

passed_cars_total{vehicle_type=”van”, has_dog=”true”}: 1 | passed_cars_total{vehicle_type=”van”, has_dog=”false”}: 0 |

passed_cars_total{vehicle_type=”suv”, has_dog=”true”}: 2 | passed_cars_total{vehicle_type=”van”, has_dog=”false”}: 1 |

passed_cars_total{vehicle_type=”truck”, has_dog=”true”}: 1 | passed_cars_total{vehicle_type=”van”, has_dog=”false”}: 1 |

We can see above that now we have to track 2 values instead of 1 value for each car type. That brings our total to 8, since we have 4 different values for **vehicle_type**, and 2 possible values for **has_dog. **So our metric has a cardinality of 8, **vehicle_type** still has a cardinality of 4, and **has_dog** has a cardinality of 2. Easy so far, right?

Let’s add one more dimension to our count of passing cars – we want to know how many of the cars are convertibles too. That’s another dimension that has two possible values, so updating our table should be similar to when we added the **has_dog** dimension, right? Not quite! This time our metric looks a little different when we add the new dimension:

passed_cars_total{vehicle_type=”sedan”, has_dog=”true”, is_convertible=”true”}: 1 | passed_cars_total{vehicle_type=”sedan”, has_dog=”false”, is_convertible=”true”}: 1 | passed_cars_total{vehicle_type=”sedan”, has_dog=”true”, is_convertaible=”false”}: 1 | passed_cars_total{vehicle_type=”sedan”, has_dog=”false”, is_convertible=”false”}: 1 |

passed_cars_total{vehicle_type=”van”, has_dog=”true”}: 1 | passed_cars_total{vehicle_type=”van”, has_dog=”false”}: 0 | ||

passed_cars_total{vehicle_type=”suv”, has_dog=”true”}: 2 | passed_cars_total{vehicle_type=”van”, has_dog=”false”}: 1 | ||

passed_cars_total{vehicle_type=”truck”, has_dog=”true”}: 1 | passed_cars_total{vehicle_type=”van”, has_dog=”false”}: 1 |

What’s different from our last two dimensions? In this case, our new dimension **is_convertible** isn’t applicable to all of the values we have for **vehicle_type**, so our metric’s cardinality has gone up to 10, instead of 16 like we might have initially expected. This is an important part to remember when you are looking at the cardinality of data – we’ll frequently have dimensions that only apply to a subset of our data. Because of this, you can’t just multiply together the cardinality of your individual dimensions to know what cardinality overall will look like; it’s better to measure it independently instead, to make sure you get an accurate answer.

## The tradeoffs of data cardinality

Now we’ve shown what happens to our data’s cardinality as we add dimensions – we were able to answer questions that we couldn’t when we started, such as how many convertibles with a dog have passed by. There’s a couple of trade-offs here though; as we’ve seen, adding dimensions to our data has the potential to significantly increase the number of values we need to track. What if we also tracked what color each car was? Our table would get a lot bigger!

There’s another problem that comes with high-cardinality as well; originally we had a single number that gave us the number of cars that had passed by, but now if we want to know how many cars have gone by, we have to add 10 values together, so it’s a little more work. That’s a perfectly acceptable exchange for us here, but it’s something we have to be conscious of as we add dimensions to our data – the more cardinality we have, the more work we have to do to get answers to less specific questions. For example, a business measuring service KPIs might add a dimension to allow them to break down key metrics by customer, but breaking those same metrics down by individual users would not give enough additional benefit in comparison to the explosion in cardinality it would cause.

## Explain Chronosphere like I’m five

Chronosphere is the only observability platform that puts you back in control by taming rampant data growth and cloud-native complexity, delivering increased business confidence. From engineering organizations at startups to well-known global brands in the Fortune 500, companies around the world trust Chronosphere to help them operate scalable, highly available, and resilient applications.

## Take the Next Step

Learn more about Chronosphere’s modern approach to observability in cloud-native environments:

## Cardinality FAQs

Cardinality is defined as the number of elements in a set or other grouping. Basically, the more dimensions, or groups, you have in a data set, the ways you can mix and match them grows exponentially. It’s important to remember that when we talk about metric cardinality, we mean the number of unique time series that are produced by a combination of metric names and their associated labels (or, dimensions). The total number of combinations that exist are cardinalities.*

**Excerpt from “What is high cardinality”*

Since cardinality is the number of possible groups depending on the dimensions the metrics have, the more combinations there are, the greater a metric’s cardinality is. This means that in fast-moving, modern cloud native environments where engineers are changing things fast and introducing potentially dozens of variables in a day, the multiplications on the basic set of telemetry data increases dramatically, causing high cardinality. What constitutes “high” vs. “low” cardinality is somewhat relative, but the better question to ask is when the ability to monitor and understand the environment gets out of hand for the engineering team.

According to the authors of a new O’Reilly Report on Cloud Native Monitoring, metric data is growing in scale due to how many different things teams are measuring and how much data each of those things produces. The shift of systems from monoliths to the cloud has resulted in an ongoing “explosion” of metric data in both volume and cardinality. See how to harness data explosion here.

It is not uncommon for the addition of a new metric or dimension to cause a cardinality explosion if the cardinality of that new dimension is unexpectedly high. These types of events can threaten the stability of a metrics platform, and for businesses using vendors that charge based on metrics, can cause significant unexpected costs.