What end-users want out of Prometheus remote storage: A comparison of M3 and Thanos

on June 15th 2021

Over the past few years, Prometheus has become the de facto open-source metrics monitoring solution. Its ease of use (i.e. single binary, out-of-the-box setup), text exposition format and tag-based query language (i.e. PromQL), and large ecosystem of exporters (i.e. custom integrations) have led to its quick and widespread adoption. Many companies have implemented Prometheus to help solve their metrics monitoring use cases. However, those with large amounts of data and highly demanding use cases quickly push Prometheus to (and often beyond) its limits. To remedy this, Prometheus created a concept called remote storage, which integrates into existing systems using remote write and read HTTP client endpoints. This allows companies to store and query Prometheus metrics at larger scales and longer retention periods.  

Optimizing your remote storage solution    

There are multiple remote storage solutions, the most popular being Cortex, M3, and Thanos. All open-source, these solutions offer long-term, remote storage, global query views across Prometheus instances, and horizontal scale. 

Through discussions with various end-users, we realized they’re all seeking the same functionality for Prometheus remote storage. They want a solution that is reliable and highly available, scalable, efficient, and cost effective. In the remainder of this blog, we’ll discuss these four key value areas as they pertain to M3 and Thanos — focusing on some of the limitations of each solution to watch out for. 

Note: M3 and Thanos are both highly effective remote storage solutions, and the below analysis is by no means comprehensive of their features or functionality. These learnings and insights are based on conversations with various cloud-native end-users considering M3 and Thanos as their Prometheus remote storage. 

Four ways to evaluate M3 and Thanos

  1. Reliability and availability

Reliability is critical for large-scale operations. With standard Prometheus, many users deploy a high availability (HA) model to ensure a more reliable system. However, when used at a large scale, the typical HA setups will start to cause issues, particularly from a query and data management perspective (you can read all about outgrowing Prometheus in this analysis). With M3 and Thanos, users are able to achieve a global query view across their Prometheus instances while remaining highly reliable. However, based on their different approaches, M3 and Thanos each have a unique set of limitations when it comes to query availability.

  • Thanos has a native query service that can apply rules (such as alerting rules and Prometheus recording rules) on top of data obtained by the querier. These rules sometimes experience elevated failure rates due to Thanos’ distributed read or query path, which involves every participating Prometheus instance. In general, distributed query paths can be more susceptible to failures and/or increased latency (especially for use cases with large query requests). Note: Thanos recently introduced a Query Frontend to improve their read path with query retry and splitting, but it is currently limited to range queries
  • M3 was built and designed to have an optimized query engine with centralized storage. Using quorum read logic, M3’s query service ensures at least two of three copies of each data point are fetched from M3DB to consistently compute results returned to the query agent (e.g. Grafana). However, because M3’s query service does not prioritize query requests, an influx of requests (especially large scale requests) can lead to a bottleneck and slow down of overall query processing. This includes query requests for alerts, leaving potential issues unresolved without intervention.      
  1. Scalability and simplicity (re: overhead management)

M3 and Thanos were built to be horizontally scalable, balancing metric loads across nodes to avoid resource exhaustion and to optimize efficiency at scale. In addition, either configuration files or key-value stores (i.e. etcd) are used to assign metrics to nodes, ultimately reducing the complex management and overhead needed to run Prometheus at scale. End-users who manage large-scale deployments are concerned with the overhead required to run their systems as they want to minimize additional headcount and infrastructure needed as they grow. 

  • Thanos is composed of a sidecar, which sits next to each Prometheus instance and acts as a proxy for serving local Prometheus data through the Thanos Store. It then uses a central querier to provide a global view by fanning out requests to all servers via the Thanos Store’s API. Due to many moving parts and the need to modify multiple config files, Thanos can be difficult to configure and operate, and many end-users have turned to Kubernetes to help manage these pieces at scale. While Thanos doesn’t have it’s own Kubernetes operator, it works with the Prometheus Operator (or a third party like Banzai Cloud’s Thanos Operator). There are also multiple helm charts from the community.  
  • M3 is composed of three core components: a distributed time series database (M3DB), an ingest and downsampling tier (M3 Coordinator), and a query engine (M3 Query). Architecturally, M3 is simple to scale up or down with M3 Query and Coordinator being stateless components, and M3DB being able to scale horizontally as one unit. However, if needing to resize the number of M3DB nodes in a cluster, it can become difficult to manage at scale as M3DB is stateful and will stream data to peers on node membership changes. As a result, M3 created and open-sourced a Kubernetes Operator to automatically scale up and down the number of storage nodes in an M3 cluster. By reducing the complexity of scaling M3 up or down, the Operator has helped users to better manage their data at large scale. However, because it relies on Kubernetes, the M3 Operator does not work with all use cases.
  1. Efficiency and speed 

As metrics data volume continues to grow, end-users are looking for ways to store data more efficiently to improve storage capacity and enable faster, more real-time query results with large data sets. To help achieve this, M3 and Thanos provide various ways to efficiently ingest and compress metrics. However, due to various architectural and design elements, each solution has their upsides and shortcomings.

  • By default, the Thanos sidecar uploads data to object storage in 2-hour blocks. Thanos then uses its compactor component to merge these blocks over time for improved query efficiency and reduced storage size. However, because the Thanos compactor is set up to only process requests for one instance at a time (i.e. a singleton), users must wait for data to be fully compacted (i.e. every 2 hours) before the aggregated data can be queried.
  • With M3, all downsampling and aggregation is done locally by the M3 Coordinator when run as a sidecar with Prometheus. With this set up, aggregation is done in real-time, and at the end of each resolution interval, aggregated metrics are immediately available for query (i.e. every 5mins if aggregating data at 5min resolution). This means, however, that aggregated data can be lost if the M3 Coordinator sidecar has downtime. To mitigate this risk, you can run the M3 Aggregator, which can tolerate downtime without any aggregate data loss, but doing so requires higher operational complexity. Note: aggregation may not be needed depending on your use case.  
  1. Affordability 

Cost is always top of mind when deciding on a new long-term metrics solution to use. M3 and Thanos each provide affordable remote storage solutions, using object and block stores, but let’s look at the differences to see which solution may be best for your use case. 

  • Thanos uses object storage, such as Amazon’s S3 or Google Cloud Storage (GCS), for long-term remote storage of Prometheus metrics. Object storage is well suited for large scale operation as it can store as much data as required without explicit management, and is best for use cases where you want to write metrics once and access them from anywhere. 
  • M3 uses block storage, such as Amazon EBS or Google Cloud Persistent Disk or Local SSD, for long-term storage. Block storage is more expensive than object storage, but is generally considered to be faster for accessing data (e.g. for high volume queries) and more easily mutable (e.g. useful when rebalancing the load between storage nodes so they can answer queries in parallel with equal portions of the data set). It is best for use cases that need persistent storage that can be modified in place, such as databases.

Choosing the best solution for you

The above comparisons are by no means comprehensive of the features or functionality for M3 and Thanos. However, by analyzing them on various limitations as seen by end-users we’ve spoken with, we hope you’re able to make a better, more informed decision about which remote storage solution is best for you and your use case. Please see the documentation for M3 and Thanos for more information. Other popular open source, Prometheus remote storage solutions include Cortex and Victoria Metrics.  

Despite being widely accepted solutions for Prometheus remote storage use cases, M3 and Thanos can both be complex and challenging to get up and running. If you don’t want to dedicate the time and resources needed to manage an open source metrics monitoring solution, then a Prometheus-native SaaS monitoring platform may be a good fit for you. Built on top of M3, Chronosphere is building a next-level cloud-native monitoring platform that is: 

  • Reliable: We replicate each data point 3x and store the data copies in geographically dispersed regions. Between our single-tenant architecture and multi-cloud support, we’re able to provide greater flexibility and reliability than any other SaaS offering on the market. 
  • Scalable: Built for cloud-native scale and complexity, Chronosphere was developed by the same team responsible for M3, one of the world’s largest real-time monitoring systems.
  • Efficient: Engineers are alerted faster and with just the right amount of data (not too much, not too little). Faster access to data means faster time to triage and quicker resolution of issues. 
  • Affordable: With Chronosphere, you control the bill. We enable you to reduce monitoring costs by letting you decide which data is kept, for how long, and at what resolutions.

If you’re interested in learning more, please reach out to contact@chronosphere.io or request a demo

Other resources you may be interested in

Interested in what we are building?