Frequently Asked Questions

Cloud native technology is produced at record-breaking speed, making it hard to keep up with the many components that define observability. What is observability anyway?! To help answer that question and more, we have compiled quick-and-easy answers about cloud native technologies, open source tooling, achieving outcome-based observability, and more. Bookmark this page!

Dots Decor Image


Cardinality is defined as the number of elements in a set or other grouping. Basically, the more dimensions, or groups, you have in a data set, the ways you can mix and match them grows exponentially. This blog, “Explain it like I’m 5: What is data cardinality?” offers some simple, real-world examples to explain this concept. For now, it’s important to remember that when we talk about metric cardinality, we mean the number of unique time series that are produced by a combination of metric names and their associated labels (or, dimensions). The total number of combinations that exist are cardinalities.*

*Excerpt from “What is high cardinality”

Since cardinality is the number of possible groups depending on the dimensions the metrics have, the more combinations there are, the greater a metric’s cardinality is. This means that in fast-moving, modern cloud native environments where engineers are changing things fast and introducing potentially dozens of variables in a day, the multiplications on the basic set of telemetry data increases dramatically, causing high cardinality. What constitutes “high” vs. “low” cardinality is somewhat relative, but the better question to ask is when the ability to monitor and understand the environment gets out of hand for the engineering team.

According to the authors of a new O’Reilly Report on Cloud Native Monitoring, metric data is growing in scale due to how many different things teams are measuring and how much data each of those things produces. The shift of systems from monoliths to the cloud has resulted in an ongoing “explosion” of metric data in both volume and cardinality. See how to harness data explosion here.

It is not uncommon for the addition of a new metric or dimension to cause a cardinality explosion if the cardinality of that new dimension is unexpectedly high. These types of events can threaten the stability of a metrics platform, and for businesses using vendors that charge based on metrics, can cause significant unexpected costs.

Cloud Native

Defined as “a set of architectural principles that allow applications to be managed efficiently at scale*”, cloud native means more than merely removing on-site computing infrastructure. Instead, it’s a series of scalable applications that are both built in and deployed on distributed cloud computing platforms. Cloud native technologies can also include techniques such as containers, service mesh, and microservices that run in dynamic environments from public, private or hybrid clouds.

* From the 451 Research primer “Raising a toast to cloud native: A primer on the cloud-native paradigm.” Download here.

Cloud native applications are designed for scalability, reliability and flexibility. Modern engineering teams can utilize small components in these applications – such as containers or microservices  – to quickly “scale up” or “scale down” their applications. In shifting to cloud native, teams can achieve efficient IT operations, improve developer productivity, and, ultimately, deliver features and functions that delight end users.

Companies embracing cloud native are seeking to improve reliability and user experience by spreading their workloads not just over multiple availability zones within a public cloud region, but over multiple public cloud regions and providers using technologies like containers. Developers and engineering teams looking to build applications that are extremely agile with the ability to scale, fast, would implement cloud native practices.

Begin with metrics and harnessing large-scale data – even though your focus should ultimately depend on your desired outcomes. Prometheus is a great way, and a widely adopted technology, to get started monitoring your cloud native environment. It does have limits at scale, however, so be mindful in how you structure your Prometheus monitoring environment (or, employ a hosted SaaS solution). Regardless, having visibility into your cloud native systems is important for rapid remediation when something goes wrong.

Cloud native systems are complex and have many moving parts. All these interconnected smaller parts can cause what is called “data cardinality.” It can be easy to lose visibility into such a distributed system, and open source monitoring techniques can be easily overloaded, leading to issues with scaling and reliability.

Distributed Tracing

Distributed tracing is the process of mapping and analyzing requests as they flow through various services. This helps you understand where errors or performance issues occur in a complex and distributed architecture. Distributed traces are often underutilized, but when captured, stored and analyzed distributed traces can enable root cause analysis, improve the accuracy of your data decisions and provide a deeper understanding of upstream and downstream dependencies.

Traces, also known as distributed tracing, is the process of mapping and analyzing requests as they flow through various services. This helps you understand where errors or performance issues occur in a complex and distributed architecture. Events let you know an action has happened. They can be used to validate the occurrence of something.

Exemplars are references to data outside of the metrics published by an application. In addition to the metrics an application pushes, it also publishes a reference to some other data that relates to what we are measuring. While exemplars give us an easy way to jump from metrics to a relevant distributed trace, there are limitations. See why our platform doesn’t rely on exemplars to link metric data to traces here.

A span is work done by an individual service or component. Spans may also be seen as a reference point.


Containers bundle applications with system tools, libraries and configuration files to create a modular package that runs the same anywhere, and Kubernetes is a way to help orchestrate containerized applications. Kubernetes is a “portable, extensible, open source platform for managing containerized workloads and services.” It is backed by the CNCF.

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers into logical units for easy management and discovery. Prometheus is the modern standard monitoring technique for monitoring Kubernetes clusters.

K8s is a numeronym, or number-based abbreviation, for Kubernetes. There are 8 letters between the “K” and the “s” of Kubernetes (yes, count them!)… so, K8s.

Kubernetes pods are the smallest deployable units of computing that you can create and manage in Kubernetes.

A cluster is a grouping of nodes that run containerized applications, or a collection of linked node machines.


Like legos, microservices serve as modular, interoperable, and independent pieces of software contained within modern applications that run in the cloud.*

*Excerpt from “An architectural view of cloud observability

In using microservices architectures, organizations can build a new microservices to not only meet their unique business needs, but also to orchestrate those microservices to work seamlessly together.* This allows for greater flexibility and greater scalability, even allowing reuse of more generic services across multiple architectures.

*Excerpt from “Achieving good observability in a cloud native system.”

Many SaaS and cloud providers execute their tasks with microservices. These microservices run on lightweight, containerized and reusable modules.* – all of which need to communicate with one another to synchronize tasks. When migrating from a monolithic architecture, implementing microservices involves identifying chunks of functionality that you can isolate and split from your main application to run as an independent application, with the monolith making requests to the new service as appropriate.

If you’re starting from scratch, this can be done by implementing each independent chunk of functionality you build as its own service from the start. With microservices, it’s important to think about the boundaries and interdependence between individual services – it’s easy to end up with a tangled mess of services, instead of the composable set of functions that are desired.

*Excerpt from “Why enterprises need cloud observability and what that looks like.”


MTTR stands for “Mean time to repair” and is a measurement of how long it takes from the first alert that something has gone wrong to remediating the issue.

MTTD stands for “Mean time to detection,” or a measurement of how long it takes to know something is wrong.

MTTR can be a simplistic way to look at observability and system health, as it can tell you how long it may take to solve an issue, but does not dive into the root causes of the issue nor how to prevent it from happening again. That said,  mature central observability teams are considering business outcomes, efficiency as they evaluate how to make applications more agile, scalable and reliable. For more, watch this lightning talk on demand, “Is MTTR still relevant in a modern, cloud native world?”


For years, observability has been defined as a collection of distinct data types – namely logs, metrics, and distributed traces. As cloud-native technology has evolved, the definition of observability has transitioned away from merely collecting data and toward a larger, more holistic practice (or process). This process is driven by being able to derive maximum value from data to rapidly remediate issues that may arise. Modern technology environments are complex, and asking key operational questions about how easily and quickly an engineer can know, triage, and find the underlying cause of a problem is the reason observability is so important.

Read more about the histor, definitions and outcomes of observability here.

O11y is a numeronym, or number-based abbreviation, for observability. There are 11 letters between the “o” and the “y” of Observability (yes, count them!)… so, o11y.

Observability platforms (such as Chronosphere) put the engineering and DevOps teams back in control of their data. Mature observability practices can provide an advantage to businesses even if they do not adopt cloud native architectures, but for cloud native companies the increased complexity of their systems makes it a must-have to ensure they can provide reliable service to their customers.

Monitoring is the regular observation, recording and alerting of the state of an organization’s technology stack. Whereas traditional monitoring alerts the team to a potential issue and can provide a reasonably good view of a system’s health, observability continuously works to identify root causes of issues as well as “improve the reliability, performance and security of a platform by analyzing trends.” The goal of great observability is rapid remediation to improve performance and evolve with business and technical priorities.

*Excerpt from “Keeping your observability platform in shape.

Application Performance Monitoring (APM) tools collect predetermined data from applications and infrastructure components to provide performance and availability analysis. APM tools were born out of a need to monitor cloud applications in a world that was moving out of on-prem technologies. But, a dramatic shift in cloud architecture in the late 2010’s – specifically containers and microservices architecture – requires flexibility for scale, speed and complexity. This is where observability comes in. Read more about the evolution of observability beyond APM here.

The central observability team. Made loosely from Site Reliability Engineers and other DevOps titles, this team supports the engineers and developers involved with delivering your service to end users. This team defines monitoring standards and practices, delivers that data to engineering teams, measures the reliability and stability of monitoring solutions and manages the tooling and storage of observability data. Learn how to build a high impact observability team.


Prometheus is the de-facto standard for monitoring cloud native environments. As companies shifted to microservices-oriented architecture on container-based infrastructure, companies needed a new way to monitor their data. Backed by the CNCF, it offers a label-based approach to monitoring data that allows users to pivot, group and explore their data along many dimensions. Prometheus uses a pull-based data collection model, as opposed to applications pushing metrics to a central store, and it has built-in discovery mechanisms to identify what targets to pull metrics from.

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers into logical units for easy management and discovery. Prometheus is the modern standard monitoring technique for monitoring Kubernetes clusters.

Prometheus provides a functional query language called PromQL that lets the user select and aggregate time series data in real time. For metrics stored within Prometheus, PromQL (the Prometheus Querying Language) is the way to query and retrieve the results you are looking for. PromQL is specifically designed for working with time series metrics and makes it easy to transform or compose different metrics together

The four primary types of Prometheus metrics are: 

    • Counters
    • Gauges
    • Histograms
    • Summaries

These metric types are found in Prometheus’ official client libraries: Go, Java, Ruby and Python.

Ready to see it in action?

Now you’re caught up on the multi-faceted world of observability —ready to get started? Discover an innovative observability platform that aids you in incorporating observability practices and procedures.