This is a guest article written by Prometheus co-founder Julius Volz of PromLabs in partnership with Chronosphere. Prometheus is an open source project hosted by the Cloud Native Computing Foundation under an open governance. PromLabs is an independent company created by Julius Volz with a focus on Prometheus training and other Prometheus-related services.
With Prometheus emerging as the de-facto standard for open source metrics-based monitoring, it has attracted many vendors in the observability space. These vendors love to claim compatibility with major Prometheus interfaces such as the PromQL query language, the metrics transfer protocols, or the alerting engine. But if you think that all vendors advertising Prometheus compatibility are equivalent and interoperable, you may be in for a surprise: The actual vendor implementations vary between being fully compatible and behaving wildly differently from Prometheus itself.
Incompatible systems can:
In this article, I will explain why Prometheus compatibility matters when choosing a vendor, how PromLabs and the Prometheus team are proactively approaching the issue of compatibility testing and conformance certification, and how Chronosphere puts compatibility first.
Within the Prometheus team, we generally welcome vendors and third-party open source projects implementing Prometheus-compatible interfaces. In the best case, a marketplace of compatible systems leads to an increased user choice between implementations with different tradeoffs, a healthy amount of competition that drives innovation, and thus a larger, better, and more interoperable Prometheus ecosystem. However, all of these benefits depend on players keeping their compatibility promise.
Compatibility is important for many reasons:
Many vendors claim compatibility with Prometheus interfaces such as PromQL, the metrics transfer protocols, or the alerting rule evaluation engine. But unfortunately these claims are not always accurate, with some vendors deviating substantially from the supported features and behaviors in Prometheus itself. Unfortunately, the level of compatibility is not always immediately obvious to users, which can cause costly surprises later on. Thus it becomes increasingly important for the Prometheus user community to understand which vendors are compatible and which ones are not.
With compatibility becoming a growing concern, both my own company PromLabs and the open source Prometheus team have created initiatives to test and certify compatibility in vendors and third-party open source projects. With PromLabs, I initially kicked off this effort by building a software tool for testing PromQL as Prometheus’ most impactful and complex interface.
In later years, PromLabs donated this tool to a larger compliance effort within the Prometheus project that aims to offer tests for a wider range of Prometheus interfaces than just PromQL, with the ultimate goal of allowing vendors to self-certify themselves as “Prometheus Compatible”.
Let’s have a look at both of these efforts in more detail.
In 2020 I saw an emergence of third-party open source projects and commercial vendors claiming PromQL compatibility and became curious about their actual level of compatibility. PromQL is the largest and most important interface within Prometheus: as a single unified query language for many purposes, PromQL allows users to select, aggregate, correlate, and otherwise process time series data in complex ways.
PromQL plays a major part in almost all Prometheus use cases, such as ad-hoc debugging, dashboarding, alerting, or automation based on the collected data. But the complexity of the language and its implementation means that there are many subtle behavioral details that are important to get right for any implementor.
Since there was (and still is) no full specification of the PromQL query language that covers all behavioral subtleties, I decided to use the Prometheus server’s own querying behavior as a reference implementation to compare against. I built a PromQL compliance testing tool that would run a set of test queries against both a standard Prometheus server and a vendor implementation.
After loading equivalent data sets into both systems, the test queries covered everything from basic data selection to more complex operations like rate computations, binary operators, or dimensional aggregations. I also aimed to probe as many special cases as possible. The testing framework would then compare the query results from both systems and generate a detailed report on the observed differences for each test case:
Using this framework, I ran multiple rounds of tests for a large number of vendors and open source projects that I reported on in detail in a series of PromLabs blog posts over the years. My personal aim was not to pass a final judgment on specific vendors, but to both raise awareness for compatibility issues in the community and to create more transparency for users when choosing a vendor.
That said, here are a few examples of vendors and open source projects that performed exceptionally well:
Unfortunately there were also vendors that received low or middling scores due to a larger number of compatibility issues. Scores are spelled out in great detail in my blog series.
While the testing tool generates a final numeric score indicating the percentage of tests passed, this score should always be taken with a grain of salt: for some vendors, even minor and potentially negligible differences in behavior could cause a large number of interrelated tests to fail, while other vendors may have failed fewer test cases in more significant ways.
In the end, a given user should always study the detailed results for a specific vendor and judge them according to their own needs and expectations. PromLabs publishes the full details for all historical test runs on its website.
Inspired by the PromQL compatibility testing done by PromLabs, the Prometheus team recognized the need for a broader compatibility testing and certification initiative within the openly-governed Prometheus project itself. Thus Richard Hartmann, myself, and others from the Prometheus team launched the Prometheus Conformance Program (PCP) in May of 2021. The goal of the PCP is to enable vendors to test their implementations for compatibility with one or more Prometheus interfaces and then self-certify their compliance in a clearly defined manner.
The PCP consists of two major parts:
Depending on the type of service that a Prometheus vendor offers, a different set of interfaces will be relevant for compatibility testing. The PCP has established four initial categories for vendor components and services, along with their compatibility requirements:
For now, vendors can already use the code in the compliance testing repository to informally ensure that their implementations are compatible with the Prometheus interfaces relevant to their product category. While the legal framework around an official compatibility mark is still a work in progress, testing compatibility in this way already provides an important community service and can help vendors uncover issues with their own products and services that need to be addressed.
We hope that the future finalization and adoption of the Prometheus Conformance Program will create a landscape of increased clarity and compatibility for everyone in the Prometheus ecosystem.
Chronosphere is a cloud native observability platform committed to providing a fully Prometheus-compatible solution to its customers. The PromQL compatibility tests by PromLabs have already shown Chronosphere to be 100% compatible with the Prometheus query language, making it a great choice for users looking for a hosted Prometheus monitoring platform that is faithful to the upstream Prometheus behavior.
Prometheus and compatible services like Chronosphere are of great mutual benefit to each other: While Prometheus as an open source project creates a global standard and growing open market for cloud native monitoring, services like Chronosphere can benefit from this market by providing compatible and competitive solutions. This in turn benefits the Prometheus project and its users by driving innovation and increasing user choice.
Chronosphere has helped customers like Robinhood and Abnormal Security in scaling their Prometheus usage.
Additionally, to show its dedication to the overall health of the Prometheus ecosystem, Chronosphere has also worked with PromLabs to donate the formerly proprietary PromLens query builder and analyzer for PromQL to the open source Prometheus project. Through this donation, all users of Prometheus-compatible systems can now use PromLens for free to build and visualize PromQL queries.
With the immense adoption of Prometheus for metrics-based monitoring, it has become paramount to test and ensure the compatibility of vendors with Prometheus’ interfaces in order to safeguard the health and longevity of the ecosystem. Compatibility is important for users to be able to trust the behavior of their observability solution so that they can avoid frustration, broken monitoring and alerting, and other costly surprises.
PromLabs’ PromQL tests helped kickstart these efforts to gain visibility into vendor compatibility, while the Prometheus Conformance Program is now expanding testing and certification to a broader set of Prometheus interfaces. The Prometheus team hopes to finalize the legal details around the PCP soon, so that vendors can signal their compatibility level in a clearly defined and informative way to their prospective users.
Meanwhile, with Chronosphere having reached a 100% compatibility score in PromLabs’ own PromQL tests and with its commitment to future compatibility, Chronosphere is an ideal option for anyone looking for a hosted Prometheus observability solution today.
For more information on Prometheus, check out the following articles from Julius:
Request a demo for an in depth walk through of the platform!