Cloud-native scale is 10-100x larger
Cloud-native environments emit a massive amount of monitoring data — somewhere between 10x and 100x more than traditional VM-based environments. This isn’t because more infrastructure is consumed, but because cloud-native environments consist of many more smaller components — many containers for each VM and many microservices for each monolith. Each of these components produces the same amount of metrics-based monitoring data as their legacy counterparts. For example, each container emits the same CPU, memory and network metric data as a VM, but since there are many containers for each VM, the result is far more monitoring and observability data than ever anticipated.
True cloud-native monitoring and observability is built for microservices’ unique monitoring challenges, addressing both the high cardinality of metrics in containerized cloud-native apps and the need to automate as much of the management overhead as possible.
Cloud-native apps are more flexible and ephemeral
Cloud-native applications and the container-based infrastructure they run on are ephemeral. They live only for the lifetime of a deployment and with the modern best practice of deploying multiple times a day, their lifetime is short. These applications are also often scaled up and down dynamically in order to serve the real-time needs of the business.
From a monitoring and observability perspective, this means both the usage patterns and retention requirements are vastly different to what they were pre-cloud-native. For example, it was quite common to store infrastructure metrics for a year as VMs and physical machines were long lived. This doesn’t make sense for container-based infrastructure metrics. It’s far more effective to store container-level metrics for a few days and then aggregate them across the container identifier for long-term analysis at an application or microservice level. APM and existing cloud hosted monitoring solutions do not provide such capabilities and using them to monitor cloud-native environments is inefficient.
Cloud-native requires more reliability and availability
Companies embracing cloud-native are increasingly spreading their workloads not just over multiple availability zones within a public cloud region, but over multiple public cloud regions and providers, often using Kubernetes to orchestrate the containers. This improves reliability and user experience.
As companies building on cloud-native services strive to meet ever- higher SLAs, they need a monitoring and observability solution that is even more reliable than the product or service it’s monitoring. It’s impossible to guarantee an SLA of 99.9% uptime if the observability tool you are using to measure that SLA doesn’t guarantee at least 99.9% uptime itself. Most existing cloud-hosted APMs and monitoring systems do not provide three 9s of reliability (99.9%).
Cloud-native needs open-source compatibility
Over the past several years, Prometheus has emerged as the industry-accepted metric protocol for cloud-native applications. Unlike legacy monitoring systems, Prometheus uses tag-based metrics instead of hierarchy-based metrics. As Prometheus and the PromQL query language have become standard, everything in the cloud-native technology stack, from infrastructure metrics to application metrics and business metrics, are all in the same format and can be combined and accessed through the same Grafana dashboards. . A modern software user no longer needs to depend on the black box magic of the observability vendor’s agent for instrumentation or be locked-in to vendor specific dashboarding and alerting.
Unfortunately, the leading providers of cloud-hosted monitoring and application performance monitoring (APM) have been slow to fully embrace this trend. While many claim to support Prometheus and PromQL, their compliance to the standards are spotty at best when put to the test.
In addition, complete control over the monitoring system, including what kinds of data to collect, how much granularity you need and how long metrics should be stored, is essential to successfully managing the sheer scale of data required to get visibility into cloud-native applications. APM vendors don’t provide this kind of control for any of their monitoring solutions. As a result, organizations often run up against hard budget constraints that force engineers to reduce data granularity across-the-board, leading to a lack of visibility.
Is it time to upgrade to cloud-native monitoring and observability?
Let’s go back to the core reason most organizations adopt cloud-native technology: To improve their organization’s ability to deliver features that will delight customers. The advantage of adopting cloud-native architecture is to have scalable, reliable and flexible applications and infrastructure and to achieve that, you also need observability that is as scalable, reliable and flexible. Observability is a critical part of being able to meet customer needs around performance and availability, but no customer is interested in how you manage to get visibility into your microservices.
Legacy monitoring and observability solutions weren’t built for either the technical realities of cloud-native applications or for the changing engineering norms around cross-functionality and open source. Open source Prometheus can theoretically scale to meet the demands of a large-scale cloud-native deployment, but in practice it can’t do so reliably. Learn more about Chronosphere’s approach to cloud-native observability that is built for scale, reliability, and control, without lock-in to proprietary formats.