An Introduction to PromQL

on May 27th 2021

For metrics stored within Prometheus, the Prometheus Querying Language (PromQL) is the main way to query and retrieve the results you are looking for. Chronosphere supports querying metrics data using PromQL and Graphite functions, but as PromQL is the most popular option we see customers use. PromQL has some differences to other query languages you might have used. Here is an overview guide to get you started.

Understand the PromQL Data Model

The majority of metrics platforms are built upon a time series database, and Chronosphere uses M3. A time series database represents metrics data as measurements over time. If you plotted the data on a graph, one of the axes is always time. A time series is a stream of values with an associated timestamp, identified by the name of the metric and any labels.

For example, a metric that reports the running total of HTTP requests to an IP address and endpoint. The metric name is http_requests_total, has the labels host and path, and a value that represents the total number of requests.

The http_requests_total metric is what Prometheus calls a “counter” metric, a metric that returns a total and continually increasing value.

Prometheus has three other metrics types.

  • Gauge: Similar to a counter, but can increase and decrease.
  • Histogram: Creates different subsequent time series (and puts them into “buckets”) from metrics, including:
    • A cumulative counter of the total values.
    • A total sum of all values.
    • Calculate quantiles using a function.
  • Summary: Similar to a histogram, but calculates and streams quantiles directly.

Querying Metrics with PromQL

To query metrics data with PromQL, you specify the metric name, filters based on label values, and various ways to filter data based on ranges, logic, functions, and aggregations. Prometheus refers to these as “instant vectors,” as they represent a set of time series where every single data point maps to a timestamp at that instant.

Continuing the example from above, to fetch all values of a particular metric, use the metric name as a query:

http_requests_total

As http_requests_total is a counter metric, if you filter the labels, a query returns metrics grouped by any remaining labels with a count of the values. For example, the query below returns all metrics where the host label has the value of the “10.2.1.2” IP address and the cumulative counts of each path label value.

http_requests_total{host="10.2.1.2"}

namehostpathvalue
http_requests_total10.2.1.1/auth20
http_requests_total10.2.1.2/create16
http_requests_total10.2.1.2/update10

PromQL supports negative matching using the != syntax, and regular expression (regex) patterns using =~. For example to return all values requesting a certain IP address range:

http_requests_total{host=~"10.2.*"}

Regular expressions in PromQL use RE2 syntax, which allows for a reasonable degree of flexibility in queries.

You can combine multiple labels with commas. For example, to filter by the host and path labels, use the following:

http_requests_total{host=~"10.2.*", path="/create"}

namehostpathvalue
http_requests_total10.2.1.2/create16

As this example only has two labels, the PromQL returns an ungrouped time series that matches the values specified. PromQL combines comma-separated values with an AND operator and has no explicit OR operator. You can replicate a form of OR operator with regex, but this only works when querying individual labels, but not a combination.

Filtering Values to a Range

Applications can capture hundreds of thousands, if not millions of values per day. Typically you need to filter results to a narrower range based on a time range or time range offset. Prometheus refers to these as “range vectors,” as they represent a set of time series where every timestamp maps to a range of data points from some point in the past determined by the query.

For example, to return metrics results from an hour ago, add offset and a time offset after a query:

http_requests_total{host="10.2.1.2"} offset 1h

Generally you use PromQL to return a range vector to then use with a function to perform calculations on the resulting range of time series. For example, to return all recorded values in the last ten minutes with their matching timestamp, add the time range in square brackets after the query.

http_requests_total{host="10.2.1.2"}[10m]

Performing Calculations on Metrics with Prometheus Functions

Prometheus functions let you perform calculations with and on your metrics data, allowing you to perform various complex processing with pre-built operations.

Each function takes different arguments, but typically at minimum, an instant or range vector. There are dozens of functions available, but a popular one is rate() that calculates the per-second average rate of increase of the multiple time series in a range vector. For example, the query below calculates the per-second average rate of rise in requests to the 10.2.1.2 IP address over the last 10 minutes.

rate(http_requests_total{host="10.2.1.2"}[10m])

When using functions you can also use standard arithmetic and binary comparison operators inside and outside the function such as addition, subtraction, greater than, less than etc. Using operators is one of the main methods to perform calculations on a combination of different time series. However if you apply the operator to more than one instant vectors, it only applies to matching series.

You can nest functions within other functions, for example to round the rate calculation to the nearest integer:

round(rate(http_requests_total{host="10.2.1.2"}[10m]))

Pinpoint Prometheus Metrics with Subqueries

Often you don’t need to query across all the metrics that Prometheus scrapes, and a smaller subset is enough to get the required detail. You can further refine the accuracy and usefulness of queries and functions with subqueries to specify further factors such as the length of time to sample metrics and how frequently to sample (called “resolution”). 

If you are returning to Prometheus and PromQL after a few years, subqueries are a relatively new feature, and to achieve the same, you had to use configuration-level “recording rules”, which reduced the ability for dynamic query creation.

For example, the following query calculates for each datapoint taken from the last 10 minutes, the average of the last 2 minutes of data (first argument in square brackets) and output a 2 minute resolution average (second argument in square brackets):

avg_over_time(rate(http_requests_total[10m])[2m:2m])

Changing the window for the length of time to, for example, 10m as the first argument in the square brackets smooths the average out more as there are more calculations to base the average on. For example:


avg_over_time(rate(http_requests_total[10m])[10m:2m])

Next Steps

Like any mature query language, PromQL is a deep and complex topic. This post gave an overview of the concepts to get you started creating queries to return time series and metrics relevant to you. In future posts we will cover other aspects of using PromQL in depth, such as creating efficient queries, choosing the most appropriate types, and what Chronosphere adds on top of PromQL.

Other resources you may be interested in

Interested in what we are building?