PromQL is a useful but often confusing tool. You can build powerful queries once you master the process, but taking the first leap into learning how to use PromQL can be intimidating and tedious. That’s why we put together this introduction to PromQL so you can get started on the right foot.
For metrics stored within Prometheus, the Prometheus Querying Language (PromQL) is the main way to query and retrieve the results you are looking for. Chronosphere supports querying metrics data using PromQL and Graphite functions, but PromQL is the most popular option we see customers use. PromQL has some differences to other query languages you might have used. Here is an overview of all the most important basics to get you started.
The majority of metrics platforms are built upon a time series database, software platforms that store time series. Monitoring experts use different platforms, but they all typically serve a similar function. Chronosphere uses M3, a Prometheus compatible metrics engine. A time series database represents metrics data as measurements over time. If you plotted the data on a graph, one of the axes is always time. A time series is a stream of values with an associated timestamp, identified by the name of the metric and any labels.
For example, a metric that reports the running total of HTTP requests to an IP address and endpoint. The metric name is http_requests_total, has the labels host and path, and a value that represents the total number of requests.
The http_requests_total
metric is what Prometheus calls a “counter” metric, a metric that returns a total and continually increasing value.
Prometheus has four other metrics types.
To query metrics data with PromQL, you specify the metric name, filter based on label values, and try other various ways to filter data based on ranges, logic, functions, and aggregations. Prometheus refers to these as “instant vectors,” which represent a set of time series where every single data point maps to a timestamp at that specific instant. Only instant vectors can be graphed.
Continuing the example from above, to fetch all values of a particular metric, use the metric name as a query:
http_requests_total
Because http_requests_total
is a counter metric, if you filter the labels, a query returns the metrics grouped by any remaining labels with a count of the values. For example, the query below returns all metrics where the host label has the value of the “10.2.1.2” IP address and the cumulative counts of each path label value.
Because http_requests_total
is a counter metric, if you filter the labels, a query returns the metrics grouped by any remaining labels with a count of the values. For example, the query below returns all metrics where the host label has the value of the “10.2.1.2” IP address and the cumulative counts of each path label value.
http_requests_total{host="10.2.1.2"}
name | host | path | value |
http_requests_total | 10.2.1.1 | /auth | 20 |
http_requests_total | 10.2.1.2 | /create | 16 |
http_requests_total | 10.2.1.2 | /update | 10 |
PromQL supports negative matching using the != syntax, and regular expression (regex) patterns using =~. For example to return all values requesting a certain IP address range:
http_requests_total{host=~"10.2.*"}
Regular expressions in PromQL use RE2 syntax, which allows for a reasonable degree of flexibility in queries.
You can combine multiple labels with commas. For example, to filter by the host and path labels, use the following:
http_requests_total{host=~"10.2.*", path="/create"}
name | host | path | value |
http_requests_total | 10.2.1.2 | /create | 16 |
Because this example only has two labels, the PromQL returns an ungrouped time series that matches the values specified. PromQL combines comma-separated values with an AND operator and has no explicit OR operator. You can replicate a form of OR operator with regex, but this only works when querying individual labels, but not a combination.
Applications can capture hundreds of thousands, if not millions of values per day. Typically you need to filter results to a narrower range based on a time range or time range offset. Prometheus refers to these as “range vectors,” as they represent a set of time series where every timestamp maps to a range of data points from some point in the past determined by the query.
For example, to return metrics results from an hour ago, add offset and a time offset after a query:
http_requests_total{host="10.2.1.2"} offset 1h
Generally you use PromQL to return a range vector to then use with a function to perform calculations on the resulting range of time series. For example, to return all recorded values in the last ten minutes with their matching timestamp, add the time range in square brackets after the query.
http_requests_total{host="10.2.1.2"}[10m]
Prometheus functions let you perform calculations with and on your metrics data, allowing you to perform various complex processing with pre-built operations.
Each function takes different arguments, but typically at minimum, an instant or range vector. There are dozens of functions available, but a popular one is rate()
that calculates the per-second average rate of increase of the multiple time series in a range vector. For example, the prometheus query below calculates the per-second average rate of rise in requests to the 10.2.1.2 IP address over the last 10 minutes.
rate(http_requests_total{host="10.2.1.2"}[10m])
When using functions you can also use standard arithmetic and binary comparison operators inside and outside the function such as addition, subtraction, greater than, less than etc. Using operators is one of the main methods to perform calculations on a combination of different time series. However if you apply the operator to more than one instant vectors, it only applies to matching series.
You can nest functions within other functions, for example to round the rate calculation to the nearest integer:
round(rate(http_requests_total{host="10.2.1.2"}[10m]))
Often you don’t need to query across all the metrics that Prometheus scrapes, and a smaller subset is enough to get the required detail. You can further refine the accuracy and usefulness of queries and functions with subqueries to specify further factors such as the length of time to sample metrics and how frequently to sample (called “resolution”).
If you are returning to Prometheus and PromQL after a few years, subqueries are a relatively new feature, and to achieve the same results, you had to use configuration-level “recording rules”, which reduced the ability for dynamic query creation.
For example, the following query calculates for each datapoint taken from the last 10 minutes, the average of the last 2 minutes of data (first argument in square brackets) and output a 2 minute resolution average (second argument in square brackets):
avg_over_time(rate(http_requests_total[10m])[2m:2m])
Changing the window for the length of time to, for example, 10m as the first argument in the square brackets smooths the average out more as there are more calculations to base the average on. For example:
avg_over_time(rate(http_requests_total[10m])[10m:2m])
Like any mature query language, PromQL is a deep and complex topic. This introduction to PromQL gave an overview of the concepts to get you started creating queries to return time series and metrics relevant to you. In future posts we will cover other aspects of using PromQL in depth, such as creating efficient queries, choosing the most appropriate types, and what Chronosphere adds on top of PromQL.
Prometheus monitoring is a great foundation to the monitoring world, and PromQL is still clearly relevant. However, Prometheus monitoring does have its limitations and many companies are outgrowing its capabilities. Learn more about Chronosphere’s integrative cloud monitoring solutions that scale with you and your business.
Request a demo for an in depth walk through of the platform!