Cloud native architectures are churning out more data, increasing the cost of monitoring tools like Datadog. We lay out better ways to manage these expenses in this blog.
On: May 15, 2024
Rachel leads Product & Solution Marketing for Chronosphere. Previously, she built out product, technical, and channel marketing at CloudHealth (acquired by VMware). Prior to that she led product marketing for AWS and cloud-integrated storage at NetApp and also spent time as an analyst at Forrester Research covering resiliency, backup, and cloud. Outside of work, she tries to keep up with her young son and hyper-active dog, and when she has time, enjoys crafting and eating out at local restaurants in Boston.
Lately, I’ve come across numerous discussions on X (formerly Twitter), Reddit, and HackerNews about the steep costs associated with Datadog. This topic has become so prevalent that engineers are sharing their strategies online for aggressively reducing metrics.
But what led us to this point? What is making these costs skyrocket? Why do some companies spend more on observability tools than on their actual production infrastructure? Many point fingers at issues like vendor lock-in and corporate greed, which undeniably play a role.
Yet, there’s a deeper problem stemming from the shift towards containerized infrastructure and microservices applications. Without addressing this foundational issue, we are doomed to repeat these mistakes.
Indeed, I work for Chronosphere, a rival to Datadog. Rest assured, this article isn’t a sales pitch. Datadog is a formidable competitor, having established a strong business over the years.
I previously worked at a company closely partnered with Datadog from 2015 to 2018, during which we witnessed its impressive growth and aspired to emulate it. However, I also observed growing frustration among Datadog’s customers over their escalating, unpredictable costs, feeling trapped with no exit.
This observation influenced my decision to join Chronosphere in 2021, anticipating that this trend was coming to a head. Before entering this field, I analyzed the market and discovered that for every dollar spent on public cloud services, about 25-35 cents goes to observability — a market primed for disruption.
The primary driver behind soaring Datadog costs is the sheer volume of observability data — metrics, logs, traces, and events — far exceeding initial predictions. Datadog’s pricing model and architecture were not designed to handle such volumes. Several factors contributed to this data proliferation:
What’s the result of all this data growth? Increased observability costs that are less predictable and don’t deliver any additional value.
I suspect there are two reasons for this:
If you’re considering options beyond Datadog, there are a few paths you can explore:
Managing your observability in-house using open-source tools is an appealing option. For metrics and traces, open-source solutions like Prometheus and OpenTelemetry, along with time series databases such as Mimir, Thanos, and M3, have evolved into widely recognized standards and present a feasible alternative to Datadog.
However, it’s important to understand that this route might not lead to actual cost savings — it essentially shifts capital expenses to operational expenses. The human and infrastructure costs required to maintain these systems are substantial, and economizing excessively could lead to future complications.
For instance, a former colleague recently transitioned his company from a costly commercial SaaS product to an open-source framework. He found that while the move appeared cost-effective on paper, about 8% of the development staff was now committed full-time to managing this system, offsetting any real savings.
Here, I’m not promoting Chronosphere, but highlighting that modern tools are being developed with the anticipation of data growth right from the start. These tools put the control of costs back into the hands of the users, ensuring there are no unexpected charges.
Similar to how Datadog and New Relic took over from older systems like Solarwinds, BMC, and CA Technologies, this newer generation of observability tools is beginning to emerge prominently. Engage with these providers to learn how they address the challenges of managing large volumes of observability data effectively, compared to merely applying short-term fixes to temporarily fix pricing pain.
The steep costs and vendor lock-in associated with Datadog have become reluctantly accepted by many; it’s recognized as a necessary component of observability despite uncertainties about the full range of available options. Datadog, with its established presence, continues to appear as a solid choice, even in light of its pricing strategies and proprietary software. However, it doesn’t have to stay this way.
As observability evolves, new players are introducing solutions designed to effectively handle the complexities of large-scale data from the outset. These alternatives offer enhanced flexibility with your infrastructure, greater control over your data, and increased transparency regarding your expenses. This evolution in the market is paving the way for observability teams to adopt tooling with pricing models that still make sense, even as you scale.
Curious to learn more about Chronosphere and next generation observability? Check out the following resources: