Automatically Making Dashboards Load 100X Faster
High cardinality metrics often cause alerts and dashboards to time out when they try to fetch too much data. Prometheus provides recording rules to speed up queries by pre-generating the queries, however, they have to be configured manually and require reconfiguring alerts and dashboards to point to the recorded series. The performance degradation often happens as new metrics are introduced with more instances or deploys and a working query may break all of a sudden. In this talk, we will show you how slow queries can be preemptively detected and automatically sped up without any manual reconfiguration. We detail two approaches to achieving automated speed ups – one that is based on recording rules and the other is based on the M3 Aggregation tier. We will compare and contrast both approaches and show examples of how one can leverage either open source method to achieve the same results.
Supercharged Analytics for Prometheus Metrics with Spark, Presto, & Superset
Prometheus continues to make it simple to alert, monitor and understand systems in a cloud native world with growing complexity. New connectors to big data query engines such as Spark and Presto enable us to ask more complex questions than ever before. We can ask questions such as “Which of my deployments in Kubernetes account for the majority of compute and network costs, and how has that grown and shrunk with respect to doing real work such as query volume from users?” We’ll walk through a working example to run Superset and Presto in docker connected to a remote Prometheus to perform advanced SQL queries of arbitrary size reliably without timeout. We’ll also demo joining metrics data using the Kubernetes node name Prometheus label to detailed Kubernetes object metadata (events, pods, etc) collected by Fluentd using a simple SQL join thanks to Presto’s query federation capabilities.