What is cloud native monitoring and observability?

Blog

Learn about the differences between application performance monitoring, cloud native monitoring, and observability.

On: Oct 14, 2023

11 MINS READ

What’s the difference between legacy, monolithic applications and cloud native, containerized applications?

A lot. Cloud native monitoring tools and applications are not simply legacy applications that have been lifted and shifted into the public cloud.

Cloud native applications have a completely different architecture and are designed to be more scalable, reliable and flexible than legacy apps. The infrastructure cloud native apps run on has also fundamentally evolved, meaning cloud native applications are built on a fundamentally different technology stack.

Cloud native apps need a new generation of monitoring and observability solutions

For all the advantages of cloud native architecture, there are a few foundational pieces that need to be redesigned from the ground up for this new model. Legacy security and networking, for example, were never designed for the dynamic nature of cloud native applications.

Monitoring and observability also require a new approach. The success of a cloud native deployment hinges on a team’s ability to have real-time visibility across the technology stack and the business it serves.

Today’s dominant players in cloud-hosted monitoring and application performance monitoring (APM) were born in a pre-cloud native world — one that had very different underlying assumptions.

They were designed for monolithic applications running on VMs, not microservices running on ephemeral containers. It’s no wonder they struggle with cloud native architectures. The three main issues these legacy solutions run into are scalability, flexibility and reliability.

See why DoorDash needed cloud native monitoring

Cloud native scale is 10-100x larger

Cloud native environments emit a massive amount of monitoring data — somewhere between 10x and 100x more than traditional VM-based environments. This isn’t because more infrastructure is consumed, but because cloud native environments consist of many more smaller components — many containers for each VM and many microservices for each monolith.

Each of these components produces the same amount of metrics-based monitoring data as their legacy counterparts. For example, each container emits the same CPU, memory and network metric data as a VM, but since there are many containers for each VM, the result is far more monitoring and observability data than ever anticipated.

For most organizations, as monitoring data volumes grow, so do costs. However, not all monitoring data should be treated equally — some data is inherently more critical or requires greater or lesser granularity and retention.

True cloud native monitoring and observability is built for microservices’ unique monitoring challenges, addressing both the high cardinality of metrics in containerized cloud native apps and the need to automate as much of the management overhead as possible.

With cloud native monitoring and observability, increased visibility into overall metrics usage and the power to set quotas and limits of quickly growing services provides organizations with the flexibility and control they need over their monitoring costs, unlike traditional VM-based environments. Organizations shouldn’t have to make trade-offs and stop monitoring some systems in order to reduce cost.

71% of organizations agree that observability data is out of control

Cloud native apps are more flexible and ephemeral

Cloud native applications and the container-based infrastructure they run on are ephemeral. They live only for the lifetime of a deployment and with the modern best practice of deploying multiple times a day, their lifetime is short. These applications are also often scaled up and down dynamically in order to serve the real-time needs of the business.

From a monitoring and observability perspective, this means both the usage patterns and retention requirements are vastly different to what they were pre-cloud native. For example, it was quite common to store infrastructure metrics for a year as VMs and physical machines were long lived.

This doesn’t make sense for container-based infrastructure metrics. It’s far more effective to store container-level metrics for a few days and then aggregate them across the container identifier for long-term analysis at an application or microservice level. APM and existing cloud hosted monitoring solutions do not provide such capabilities and using them to monitor cloud native environments is inefficient.

Cloud native requires more reliability and availability

Companies embracing cloud native are increasingly spreading their workloads not just over multiple availability zones within a public cloud region, but over multiple public cloud regions and providers, often using Kubernetes to orchestrate the containers. This improves reliability and user experience.

As companies building on cloud native services strive to meet ever-higher SLAs, they need a monitoring and observability solution that is even more reliable than the product or service it’s monitoring. It’s impossible to guarantee an SLA of 99.9% uptime if the observability tool you are using to measure that SLA doesn’t guarantee at least 99.9% uptime itself. Most existing cloud-hosted APMs and monitoring systems do not provide three 9s of reliability (99.9%).

Cloud native needs open source compatibility

Over the past several years, Prometheus has emerged as the industry-accepted metric protocol for cloud native applications. Unlike legacy monitoring systems, Prometheus uses tag-based metrics instead of hierarchy-based metrics.

As Prometheus and the PromQL query language have become standard, everything in the cloud native technology stack, from infrastructure metrics to application metrics and business metrics, are all in the same format and can be combined and accessed through the same Grafana dashboards. A modern software user no longer needs to depend on the black box magic of the observability vendor’s agent for instrumentation or be locked-in to vendor specific dashboarding and alerting.

Unfortunately, the leading providers of cloud-hosted monitoring and application performance monitoring (APM) have been slow to fully embrace this trend. While many claim to support Prometheus and PromQL, their compliance to the standards are spotty at best when put to the test.

In addition, complete control over the monitoring system, including what kinds of data to collect, how much granularity you need and how long metrics should be stored, is essential to successfully managing the sheer scale of data required to get visibility into cloud native applications.

APM vendors don’t provide this kind of control for any of their monitoring solutions. As a result, organizations often run up against hard budget constraints that force engineers to reduce data granularity across-the-board, leading to a lack of visibility.

Is it time to upgrade to cloud native monitoring and observability?

Let’s go back to the core reason most organizations adopt cloud native technology: To improve their organization’s ability to deliver features that will delight customers. The advantage of adopting cloud native architecture is to have scalable, reliable, and flexible applications and infrastructure, and to achieve that, you also need observability that is as scalable, reliable, and flexible. Observability is a critical part of being able to meet customer needs around performance and availability, but no customer is interested in how you manage to get visibility into your microservices.

Legacy monitoring and observability solutions weren’t built for either the technical realities of cloud native applications or for the changing engineering norms around cross-functionality and open source. Open source Prometheus can theoretically scale to meet the demands of a large-scale cloud native deployment, but in practice it can’t do so reliably.

Learn more about Chronosphere’s approach to cloud native observability that is built for scale, reliability, and control, without lock-in to proprietary formats.

Book a 15-minute demo with an observability expert.

Request a Demo

Cloud native FAQs

What is cloud native?

Defined as “a set of architectural principles that allow applications to be managed efficiently at scale*”, cloud native means more than merely removing on-site computing infrastructure.

Instead, it’s a series of scalable applications that are both built in and deployed on distributed cloud computing platforms. Cloud native technologies can also include techniques such as containers, service mesh, and microservices that run in dynamic environments from public, private or hybrid clouds.

* From the 451 Research primer “Raising a toast to cloud native: A primer on the cloud-native paradigm.” Download here.

Why cloud native?

Cloud native applications are designed for scalability, reliability and flexibility. Modern engineering teams can utilize small components in these applications – such as containers or microservices – to quickly “scale up” or “scale down” their applications.

In shifting to cloud native, teams can achieve efficient IT operations, improve developer productivity, and, ultimately, deliver features and functions that delight end users.

Who is using cloud native?

Companies embracing cloud native are seeking to improve reliability and user experience by spreading their workloads not just over multiple availability zones within a public cloud region, but over multiple public cloud regions and providers using technologies like containers.

Developers and engineering teams looking to build applications that are extremely agile with the ability to scale, fast, would implement cloud native practices.

How do you monitor your cloud native environment?

Begin with metrics and harnessing large-scale data – even though your focus should ultimately depend on your desired outcomes. Prometheus is a great way, and a widely adopted technology, to get started monitoring your cloud native environment.

It does have limits at scale, however, so be mindful in how you structure your Prometheus monitoring environment (or, employ a hosted SaaS solution). Regardless, having visibility into your cloud native systems is important for rapid remediation when something goes wrong.

What are the negatives to cloud native?

Cloud native systems are complex and have many moving parts. All these interconnected smaller parts can cause what is called “data cardinality.” It can be easy to lose visibility into such a distributed system, and open source monitoring techniques can be easily overloaded, leading to issues with scaling and reliability.

Observability FAQs

What is observability?

For years, observability has been defined as a collection of distinct data types – namely logs, metrics, and distributed traces. As cloud native technology has evolved, the definition of observability has transitioned away from merely collecting data and toward a larger, more holistic practice (or process).

This process is driven by being able to derive maximum value from data to rapidly remediate issues that may arise. Modern technology environments are complex, and asking key operational questions about how easily and quickly an engineer can know, triage, and understand the underlying cause of a problem is the reason observability is so important.

Why do people call it “o11y?”

O11y is a numeronym, or number-based abbreviation, for observability. There are 11 letters between the “o” and the “y” of Observability (yes, count them!)… so, o11y.

Why do I need observability?

Observability platforms (such as Chronosphere) put the engineering and DevOps teams back in control of their data. Mature observability practices can provide an advantage to businesses even if they do not adopt cloud native architectures, but for cloud native companies the increased complexity of their systems makes it a must-have to ensure they can provide reliable service to their customers.

Observability vs. monitoring

Monitoring is the regular observation, recording and alerting of the state of an organization’s technology stack. Whereas traditional monitoring alerts the team to a potential issue and can provide a reasonably good view of a system’s health, observability continuously works to identify root causes of issues as well as “improve the reliability, performance and security of a platform by analyzing trends.”

The goal of great observability is rapid remediation to improve performance and evolve with business and technical priorities.

*Excerpt from “Keeping your observability platform in shape.”

Observability vs. APM

Application Performance Monitoring (APM) tools collect predetermined data from applications and infrastructure components to provide performance and availability analysis. APM tools were born out of a need to monitor cloud applications in a world that was moving out of on-premises technologies.

But, a dramatic shift in cloud architecture in the late 2010’s – specifically containers and microservices architecture – requires flexibility for scale, speed and complexity. This is where observability comes in. Read more about the evolution of observability beyond APM here.

Who owns observability in an organization?

The central observability team. Made loosely from Site Reliability Engineers and other DevOps titles, this team supports the engineers and developers involved with delivering your service to end users.

This team defines monitoring standards and practices, delivers that data to engineering teams, measures the reliability and stability of monitoring solutions and manages the tooling and storage of observability data. Learn how to build a high impact observability team.