What is cloud-native monitoring?

 

 

 

What’s the difference between legacy, monolithic applications and cloud-native, containerized applications? A lot. Cloud-native applications are not simply legacy applications that have been lifted and shifted into the public cloud. Cloud-native applications have a completely different architecture and are designed to be more scalable, reliable and flexible than legacy apps. The infrastructure cloud-native apps run on has also fundamentally evolved, meaning cloud-native applications are built on a fundamentally different technology stack. 

Cloud-native apps need a new generation of monitoring and observability solutions

For all the advantages of cloud-native architecture, there are a few foundational pieces that need to be redesigned from the ground up for this new model. Legacy security and networking, for example, were never designed for the dynamic nature of cloud-native applications. Monitoring and observability also require a new approach. The success of a cloud-native deployment hinges on a team’s ability to have real-time visibility across the technology stack and the business it serves.

Today’s dominant players in cloud-hosted monitoring and application performance monitoring (APM) were born in a pre-cloud-native world — one that had very different underlying assumptions. They were designed for monolithic applications running on VMs, not microservices running on ephemeral containers. It’s no wonder they struggle with  cloud-native architectures. The three main issues these legacy solutions run into are scalability, flexibility and reliability.

Case study: Delivery app upgrades to cloud-native monitoring

Cloud-native scale is 10-100x larger

Cloud-native environments emit a massive amount of monitoring data — somewhere between 10x and 100x more than traditional VM-based environments. This isn’t because more infrastructure is consumed, but because cloud-native environments consist of many more smaller components — many containers for each VM and many microservices for each monolith. Each of these components produces the same amount of metrics-based monitoring data as their legacy counterparts. For example, each container emits the same CPU, memory and network metric data as a VM, but since there are many containers for each VM, the result is far more monitoring and observability data than ever anticipated. 

True cloud-native monitoring and observability is built for microservices’ unique monitoring challenges, addressing both the high cardinality of metrics in containerized cloud-native apps and the need to automate as much of the management overhead as possible.

Cloud-native apps are more flexible and ephemeral

Cloud-native applications and the container-based infrastructure they run on are ephemeral. They live only for the lifetime of a deployment and with the modern best practice of deploying multiple times a day, their lifetime is short. These applications are also often scaled up and down dynamically in order to serve the real-time needs of the business.

From a monitoring and observability perspective, this means both the usage patterns and retention requirements are vastly different to what they were pre-cloud-native. For example, it was quite common to store infrastructure metrics for a year as VMs and physical machines were long lived. This doesn’t make sense for container-based infrastructure metrics. It’s far more effective to store container-level metrics for a few days and then aggregate them across the container identifier for long-term analysis at an application or microservice level. APM and existing cloud hosted monitoring solutions do not provide such capabilities and using them to monitor cloud-native environments is inefficient. 

Cloud-native requires more reliability and availability

Companies embracing cloud-native are increasingly spreading their workloads not just over multiple availability zones within a public cloud region, but over multiple public cloud regions and providers, often using Kubernetes to orchestrate the containers. This improves reliability and user experience. 

As companies building on cloud-native services strive to meet ever- higher SLAs, they need a monitoring and observability solution that is even more reliable than the product or service it’s monitoring. It’s impossible to guarantee an SLA of 99.9% uptime if the observability tool you are using to measure that SLA doesn’t guarantee at least 99.9% uptime itself. Most existing cloud-hosted APMs and monitoring systems do not provide three 9s of reliability (99.9%).

Cloud-native needs open-source compatibility

Over the past several years, Prometheus has emerged as the industry-accepted metric protocol for cloud-native applications. Unlike legacy monitoring systems, Prometheus uses tag-based metrics instead of hierarchy-based metrics. As Prometheus and the PromQL query language have become standard, everything in the cloud-native technology stack, from infrastructure metrics to application metrics and business metrics, are all in the same format and can be combined and accessed through the same Grafana dashboards. . A modern software user no longer needs to depend on the black box magic of the observability vendor’s agent for instrumentation or be locked-in to vendor specific dashboarding and alerting. 

Unfortunately, the leading providers of cloud-hosted monitoring and application performance monitoring (APM) have been slow to fully embrace this trend. While many claim to support Prometheus and PromQL, their compliance to the standards are spotty at best when put to the test.

In addition, complete control over the monitoring system, including what kinds of data to collect, how much granularity you need and how long metrics should be stored, is essential to successfully managing the sheer scale of data required to get visibility into cloud-native applications. APM vendors don’t provide this kind of control for any of their monitoring solutions. As a result, organizations often run up against hard budget constraints that force engineers to reduce data granularity across-the-board, leading to a lack of visibility. 

Is it time to upgrade to cloud-native monitoring and observability?

Let’s go back to the core reason most organizations adopt cloud-native technology: To improve their organization’s ability to deliver features that will delight customers. The advantage of adopting cloud-native architecture is to have scalable, reliable and flexible applications and infrastructure and to achieve that, you also need observability that is as scalable, reliable and flexible.  Observability is a critical part of being able to meet customer needs around performance and availability, but no customer is interested in how you manage to get visibility into your microservices.

Legacy monitoring and observability solutions weren’t built for either the technical realities of cloud-native applications or for the changing engineering norms around cross-functionality and open source. Open source Prometheus can theoretically scale to meet the demands of a large-scale cloud-native deployment, but in practice it can’t do so reliably. Learn more about Chronosphere’s approach to cloud-native observability that is built for scale, reliability, and control, without lock-in to proprietary formats.

Book a 15-minute demo with an observability expert.