The importance of compatibility when choosing a Prometheus vendor

Two hands holding a piece of jigsaw puzzle, demonstrating compatibility between Prometheus vendor products.
ACF Image Blog

This is a guest article written by Prometheus co-founder Julius Volz of PromLabs in partnership with Chronosphere. Prometheus is an open source project hosted by the Cloud Native Computing Foundation under an open governance. PromLabs is an independent company created by Julius Volz with a focus on Prometheus training and other Prometheus-related services.

11 MINS READ

Prometheus compatibility: All vendors are not equal

With Prometheus emerging as the de-facto standard for open source metrics-based monitoring, it has attracted many vendors in the observability space. These vendors love to claim compatibility with major Prometheus interfaces such as the PromQL query language, the metrics transfer protocols, or the alerting engine. But if you think that all vendors advertising Prometheus compatibility are equivalent and interoperable, you may be in for a surprise: The actual vendor implementations vary between being fully compatible and behaving wildly differently from Prometheus itself.

Incompatible systems can:

  • Store the wrong data
  • Give you inaccurate query results
  • Cause user confusion
  • And, in the worst case, make you miss an expensive outage because your monitoring and alerting is broken

In this article, I will explain why Prometheus compatibility matters when choosing a vendor, how PromLabs and the Prometheus team are proactively approaching the issue of compatibility testing and conformance certification, and how Chronosphere puts compatibility first.

Why compatibility matters

Within the Prometheus team, we generally welcome vendors and third-party open source projects implementing Prometheus-compatible interfaces. In the best case, a marketplace of compatible systems leads to an increased user choice between implementations with different tradeoffs, a healthy amount of competition that drives innovation, and thus a larger, better, and more interoperable Prometheus ecosystem. However, all of these benefits depend on players keeping their compatibility promise.

Compatibility is important for many reasons:

  • Avoiding user confusion and breakage: Users will be surprised and confused by a system that does not behave as advertised. For example, a user will expect a PromQL query to return a specific result, but an incompatible system may yield a slightly (or wildly) different output. In the best case, this will only cost you some engineering time and effort to figure out the discrepancy. In the worst case, an unnoticed incompatibility can cause your alerting rules to silently break and not alert you about a critical incident.
  • Avoiding ecosystem fragmentation: In a compatible ecosystem, all vendors can come together to work on shared software tooling, documentation, and other resources related to the implemented Prometheus interfaces. For example, an open source PromQL query builder and analyzer like PromLens will work well with all PromQL-compatible systems, but will either misbehave or show incorrect query explanations for systems with incompatible or extended PromQL features. Incompatible systems make it harder to re-use such shared tooling and knowledge, and they will eventually fragment the user base and development community.
  • Avoiding vendor lock-in: Compatible systems allow users to migrate between alternatives in the marketplace, while incompatible systems make this progressively harder. In the worst case, a user ends up unwittingly depending on specific incompatible behaviors of one vendor and then finds themselves locked into that vendor’s product.
  • Enabling fair comparisons: Performance or cost comparisons between vendors are only fair and informative when you compare equal and compatible feature sets. For example, if one vendor’s system stores sample values with a lower precision than Prometheus does, any resulting storage size and memory usage comparisons will become less fair and comparable.

Many vendors claim compatibility with Prometheus interfaces such as PromQL, the metrics transfer protocols, or the alerting rule evaluation engine. But unfortunately these claims are not always accurate, with some vendors deviating substantially from the supported features and behaviors in Prometheus itself. Unfortunately, the level of compatibility is not always immediately obvious to users, which can cause costly surprises later on. Thus it becomes increasingly important for the Prometheus user community to understand which vendors are compatible and which ones are not.

Existing efforts to test compatibility

With compatibility becoming a growing concern, both my own company PromLabs and the open source Prometheus team have created initiatives to test and certify compatibility in vendors and third-party open source projects. With PromLabs, I initially kicked off this effort by building a software tool for testing PromQL as Prometheus’ most impactful and complex interface.

In later years, PromLabs donated this tool to a larger compliance effort within the Prometheus project that aims to offer tests for a wider range of Prometheus interfaces than just PromQL, with the ultimate goal of allowing vendors to self-certify themselves as “Prometheus Compatible”.

Let’s have a look at both of these efforts in more detail.

PromQL compatibility testing

In 2020 I saw an emergence of third-party open source projects and commercial vendors claiming PromQL compatibility and became curious about their actual level of compatibility. PromQL is the largest and most important interface within Prometheus: as a single unified query language for many purposes, PromQL allows users to select, aggregate, correlate, and otherwise process time series data in complex ways.

PromQL plays a major part in almost all Prometheus use cases, such as ad-hoc debugging, dashboarding, alerting, or automation based on the collected data. But the complexity of the language and its implementation means that there are many subtle behavioral details that are important to get right for any implementor.

Since there was (and still is) no full specification of the PromQL query language that covers all behavioral subtleties, I decided to use the Prometheus server’s own querying behavior as a reference implementation to compare against. I built a PromQL compliance testing tool that would run a set of test queries against both a standard Prometheus server and a vendor implementation.

After loading equivalent data sets into both systems, the test queries covered everything from basic data selection to more complex operations like rate computations, binary operators, or dimensional aggregations. I also aimed to probe as many special cases as possible. The testing framework would then compare the query results from both systems and generate a detailed report on the observed differences for each test case:

Using this framework, I ran multiple rounds of tests for a large number of vendors and open source projects that I reported on in detail in a series of PromLabs blog posts over the years. My personal aim was not to pass a final judgment on specific vendors, but to both raise awareness for compatibility issues in the community and to create more transparency for users when choosing a vendor.

That said, here are a few examples of vendors and open source projects that performed exceptionally well:

  • Chronosphere achieved a 100% test score, passing all PromQL compatibility tests.
  • M3DB, the open source metrics database that Chronosphere is based on, also achieved a 100% test score.
  • Thanos, an open source reimplementation of Prometheus with different tradeoffs, achieved a 100% test score.

Unfortunately there were also vendors that received low or middling scores due to a larger number of compatibility issues. Scores are spelled out in great detail in my blog series.

While the testing tool generates a final numeric score indicating the percentage of tests passed, this score should always be taken with a grain of salt: for some vendors, even minor and potentially negligible differences in behavior could cause a large number of interrelated tests to fail, while other vendors may have failed fewer test cases in more significant ways.

In the end, a given user should always study the detailed results for a specific vendor and judge them according to their own needs and expectations. PromLabs publishes the full details for all historical test runs on its website.

Prometheus conformance program

Inspired by the PromQL compatibility testing done by PromLabs, the Prometheus team recognized the need for a broader compatibility testing and certification initiative within the openly-governed Prometheus project itself. Thus Richard Hartmann, myself, and others from the Prometheus team launched the Prometheus Conformance Program (PCP) in May of 2021. The goal of the PCP is to enable vendors to test their implementations for compatibility with one or more Prometheus interfaces and then self-certify their compliance in a clearly defined manner.

The PCP consists of two major parts:

  • Technical compliance tests: A compliance testing code repository contains work-in-progress frameworks for automatically testing an implementation’s compatibility with Prometheus interfaces such as PromQL, the alert evaluation engine, OpenMetrics, and remote write receivers and senders. PromLabs donated the PromQL compliance tester described above to this repository, so that it could live on as part of the openly-governed Prometheus project.
  • Legal framework: While still a work-in-progress, the PCP will eventually allow vendors to enter into a legal agreement with the Linux Foundation that defines clear terms around a vendor’s self-certification, renewed certification over time, as well as terms and conditions for using a “Prometheus Compatible” logo and mark.

Depending on the type of service that a Prometheus vendor offers, a different set of interfaces will be relevant for compatibility testing. The PCP has established four initial categories for vendor components and services, along with their compatibility requirements:

  • Metrics Exposers need to be compatible with Prometheus’ OpenMetrics metrics exposition format.
  • Agents/Collectors must be able to scrape and forward metrics data in a compatible way, while potentially applying target labels to the scraped metrics.
  • Prometheus Storage Backends must be able to receive metrics via the Prometheus remote-write protocol, expose the data for querying via PromQL, and pass to-be-defined storage layer tests.
  • Full Prometheus Compatibility is the highest level of compatibility and requires implementing Prometheus’ alert generation facilities, PromQL querying, OpenMetrics, remote write receiving/sending, and storage layer verification tests.

For now, vendors can already use the code in the compliance testing repository to informally ensure that their implementations are compatible with the Prometheus interfaces relevant to their product category. While the legal framework around an official compatibility mark is still a work in progress, testing compatibility in this way already provides an important community service and can help vendors uncover issues with their own products and services that need to be addressed.

We hope that the future finalization and adoption of the Prometheus Conformance Program will create a landscape of increased clarity and compatibility for everyone in the Prometheus ecosystem.

Chronosphere and Prometheus compatibility

Chronosphere is a cloud native observability platform committed to providing a fully Prometheus-compatible solution to its customers. The PromQL compatibility tests by PromLabs have already shown Chronosphere to be 100% compatible with the Prometheus query language, making it a great choice for users looking for a hosted Prometheus monitoring platform that is faithful to the upstream Prometheus behavior.

Prometheus and compatible services like Chronosphere are of great mutual benefit to each other: While Prometheus as an open source project creates a global standard and growing open market for cloud native monitoring, services like Chronosphere can benefit from this market by providing compatible and competitive solutions. This in turn benefits the Prometheus project and its users by driving innovation and increasing user choice.

Chronosphere has helped customers like Robinhood and Abnormal Security in scaling their Prometheus usage.

Additionally, to show its dedication to the overall health of the Prometheus ecosystem, Chronosphere has also worked with PromLabs to donate the formerly proprietary PromLens query builder and analyzer for PromQL to the open source Prometheus project. Through this donation, all users of Prometheus-compatible systems can now use PromLens for free to build and visualize PromQL queries.

Companies need to trust observability solution behavior

With the immense adoption of Prometheus for metrics-based monitoring, it has become paramount to test and ensure the compatibility of vendors with Prometheus’ interfaces in order to safeguard the health and longevity of the ecosystem. Compatibility is important for users to be able to trust the behavior of their observability solution so that they can avoid frustration, broken monitoring and alerting, and other costly surprises.

PromLabs’ PromQL tests helped kickstart these efforts to gain visibility into vendor compatibility, while the Prometheus Conformance Program is now expanding testing and certification to a broader set of Prometheus interfaces. The Prometheus team hopes to finalize the legal details around the PCP soon, so that vendors can signal their compatibility level in a clearly defined and informative way to their prospective users.

Meanwhile, with Chronosphere having reached a 100% compatibility score in PromLabs’ own PromQL tests and with its commitment to future compatibility, Chronosphere is an ideal option for anyone looking for a hosted Prometheus observability solution today.

For more information on Prometheus, check out the following articles from Julius:

Share This: