Over the past few years, Prometheus has become the de facto open-source metrics monitoring solution. Its ease of use (i.e. single binary, out-of-the-box setup), text exposition format and tag-based query language (i.e. PromQL), and large ecosystem of exporters (i.e. custom integrations) have led to its quick and widespread adoption.
Many companies have implemented Prometheus to help solve their metrics monitoring use cases. However, those with large amounts of data and highly demanding use cases quickly push Prometheus to (and often beyond) its limits. To remedy this, Prometheus created a concept called remote storage, which integrates into existing systems using remote write and read HTTP client endpoints. Remote storage allows companies to store and query Prometheus metrics at larger scales and longer retention periods.
There are multiple remote storage solutions, the most popular being Cortex, M3, and Thanos. All open-source, these solutions offer long-term, remote storage, global query views across Prometheus instances, and horizontal scale.
Through discussions with various end-users, we realized they’re all seeking the same functionality for Prometheus remote storage. They want a solution that is reliable and highly available, scalable, efficient, and cost effective. In the remainder of this blog, we’ll discuss these four key value areas as they pertain to M3 and Thanos — focusing on some of the limitations of each solution to watch out for.
Note: M3 and Thanos are both highly effective remote storage solutions, and the below analysis is by no means comprehensive of their features or functionality. These learnings and insights are based on conversations with various cloud-native end-users considering M3 and Thanos as their Prometheus remote storage.
Reliability is critical for large-scale operations. With standard Prometheus, many users deploy a high availability (HA) model to ensure a more reliable system. However, when used at a large scale, the typical HA setups will start to cause issues, particularly from a query and data management perspective (you can read all about outgrowing Prometheus in this analysis). With M3 and Thanos, users are able to achieve a global query view across their Prometheus instances while remaining highly reliable. However, based on their different approaches, M3 and Thanos each have a unique set of limitations when it comes to query availability.
2. Scalability and simplicity (re: overhead management)
M3 and Thanos were built to be horizontally scalable, balancing metric loads across nodes to avoid resource exhaustion and to optimize efficiency at scale. In addition, either configuration files or key-value stores (i.e. etcd) are used to assign metrics to nodes, ultimately reducing the complex management and overhead needed to run Prometheus at scale. End-users who manage large-scale deployments are concerned with the overhead required to run their systems as they want to minimize additional headcount and infrastructure needed as they grow.
3. Efficiency and speed
As metrics data volume continues to grow, end-users are looking for ways to store data more efficiently to improve storage capacity and enable faster, more real-time query results with large data sets. To help achieve this, M3 and Thanos provide various ways to efficiently ingest and compress metrics. However, due to various architectural and design elements, each solution has their upsides and shortcomings.
4. Affordability
Cost is always top of mind when deciding on a new long-term metrics solution to use. M3 and Thanos each provide affordable remote storage solutions, using object and block stores, but let’s look at the differences to see which solution may be best for your use case.
The above comparisons are by no means comprehensive of the features or functionality for M3 and Thanos. However, by analyzing them on various limitations as seen by end-users we’ve spoken with, we hope you’re able to make a better, more informed decision about which remote storage solution is best for you and your use case. Please see the documentation for M3 and Thanos for more information. Other popular open source, Prometheus remote storage solutions include Cortex and Victoria Metrics.
Despite being widely accepted solutions for Prometheus remote storage use cases, M3 and Thanos can both be complex and challenging to get up and running. If you don’t want to dedicate the time and resources needed to manage an open source metrics monitoring solution, then a Prometheus-native SaaS monitoring platform may be a good fit for you. Built on top of M3, Chronosphere is building a next-level cloud-native monitoring platform that is:
If you’re interested in learning more about remote storage options, please reach out to [email protected] or request a demo
Request a demo for an in depth walk through of the platform!