Video: Monitoring with Prometheus and its limitations at scale

on February 1st 2022

Companies around the world have integrated Prometheus – the Cloud Native Computing Foundation (CNCF)-recommended open-source metrics monitoring solution – into their existing cloud-native architectures to solve various monitoring use cases. Prometheus was developed in 2012 in response to the growing need for cloud-native monitoring services. Its popularity and widespread adoption are due to three main features:

  • Ease of use (i.e. single binary, out-of-the-box setup)
  • Text exposition format and query language (i.e. PromQL)
  • Large ecosystem of exporters (i.e. custom integrations)  

Have you outgrown Prometheus?

Prometheus was designed with an efficient data store to optimize for quick query requests and alerts. By storing data locally on disk, Prometheus is great for short-term use cases. However, when storing and querying against longer-term (and larger scale) data, it can easily become overwhelmed. If you’ve reached the stage where you need to look for another monitoring solution, you’ve probably already started to feel the limitations of Prometheus at scale. (See how Tecton freed up engineering time with Chronosphere!)

In the video below, we discuss how Prometheus presents limitations around reliability, scalability, and efficiency – all of which led to the creation of Prometheus Remote Storage. See the outline below for the key moments in the video: 

  1. Reliability: While a high availability Prometheus setup can work for many use cases, it exposes various limitations at scale, such as inconsistent query results due to gaps in graphs when performing rolling restarts of your instances. When working on mission-critical services, you can’t afford data loss. (04:08)
  2. Scalability: When operating multiple Prometheus instances, federation is needed to achieve a centralized view of data. And at a large scale, this can quickly become unmanageable as engineers are no longer able to quickly locate their data. (07:45)
  3. Efficiency: While optimized for storing short-term data, Prometheus does not have built-in downsampling capabilities. If you need a highly granular view of your data while maintaining metrics at longer retention periods for querying or alerting purposes, then you need a solution with built-in downsampling. (11:55)
  4. Prometheus Remote Storage: Not wanting to dedicate more time and resources to managing Prometheus infrastructures, many users with a need for a longer term metrics store have turned to open-source Prometheus remote storage compatible solutions to help with the complex operation and management of Prometheus at scale. (14:20)

If any of these limitations sound familiar, make sure to check out our more detailed write-up to learn more about outgrowing Prometheus. At Chronosphere, we’re building the only Prometheus-native observability platform that puts you back in control by taming rampant data growth and cloud-native complexity, delivering increased business confidence. Request a product demo to learn more.

Watch the video, Monitoring with Prometheus and limitations at scale:

Interested in what we are building?