Auto-suggesting and generating recording rules for Prometheus

High cardinality metrics often cause alerts and dashboards to time out when they try to fetch too much data. Prometheus provides recording rules to speed up queries by pre-generating the queries, however, they have to be configured manually and require reconfiguring alerts and dashboards to point to the recorded series. The performance degradation often happens as new metrics are introduced with more instances or deploys and a working query may break all of a sudden.

In this talk, we will show you how slow queries can be preemptively detected and automatically sped up without any manual reconfiguration. This is done by automatically analyzing the widely available inbuilt Prometheus query log and generating suggested recording rules for frequently queried metrics that take considerable time to execute.

We’ll walk through a concrete demo of the tool which can also use parameters min-query-time and min-query-count to help suggest the most impactful recording rules.

YouTube video