How the Chronosphere collector ingests metrics from a customer’s various applications and then sends them along to the Chronosphere backend.
On: Jul 29, 2021
Chronosphere is a cloud-native metrics monitoring solution built on top of M3 that’s designed to reliably and efficiently ingest, store, and query or alert against metrics at massive scale. However, in order to get started, you need to send your metrics along to our backend via the Chronosphere collector.
Note: this blog assumes a basic understanding of Kubernetes and terminology related to Kubernetes (e.g. pod, manifest, DaemonSet, etc.). While most terms are hyperlinked or explained below, you can also use the Kubernetes glossary as a resource if needed.
The Chronosphere collector is responsible for ingesting metrics from a customer’s various applications and then sending them along to the Chronosphere backend. It is the only Chronosphere software deployed in customer environments.
In most use cases, Chronosphere customers use Kubernetes to manage their cluster(s), which have pods running on each node. Based on the customer’s use case, there are multiple ways to deploy and configure the collector within a cluster. For example, a customer can choose to run one or many collectors within a cluster or node. Or they can choose to deploy a collector for an entire node or for a single pod within a node depending on their use case(s).
The collector utilizes Kubernetes and Service Monitor informers to pass along the latest information on the pods within a node or cluster to the collector’s processing layer. With this information, the collector will know which applications or pods to get metrics from, and will use scrape jobs to collect these metrics.
Scrape jobs use a pull based model to scrape Prometheus metrics from the various applications. Prometheus metrics can also be scraped using the information or metadata generated by pod annotations. For StatsD or M3 metrics, the collector exposes User Datagram Protocol (UDP) endpoints, which use a push based model to ingest metrics from their respective applications.
Once a scrape job or UDP endpoint has collected metrics, the collector will send them to the Chronosphere backend via the Chronosphere metrics ingester. From there, customers are able to query and alert against their metrics. See the below diagram for a high-level overview of the collector’s architecture and flow of metrics from a Kubernetes cluster to the Chronosphere backend:
Let’s now take a step back and look at the three ways a customer can deploy the collector in their environment(s).
While these are all different ways of deploying the collector, they are not mutually exclusive. In other words, you can have various deployment types within a node or cluster. For example, you may want a DaemonSet deployment to scrape standard metrics from all nodes in a cluster. But you might also want a Sidecar deployment for one of the pods or applications within a node that produces more business critical metrics at a large scale.
But how do you then prevent double scraping if there are multiple collectors deployed in a node? The main way to ensure this doesn’t happen is to remember how you “mixed and matched” your deployments, and to explicitly set the collectors to target only what you want them to. Adding annotations to metrics from each pod or node can help with this, but monitoring the metrics volumes from your various collectors is the most effective way to understand whether double scraping of your metrics is happening or not.
As with any product, we are always looking for ways to improve the collector and one of our primary sources of feedback comes from our internal usage and testing of the collector. We are constantly monitoring the status and health of our customers’ collectors, as well as testing any new deployments of the collector via scenario tests from Temporal. Our customers deploy the collector in many different ways, and these scenario tests allow us to re-create and test their various approaches to ensure everything in production is operating as intended.
In terms of the roadmap for the collector, we are continuing to add more tests and to improve the UX for our customers. This includes additional functionality around troubleshooting or debugging when managing multiple collector deployments, as well as around reducing resources needed to run the collector. If you’re interested in learning more about Chronosphere and the collector, please reach out to contact@chronospherdev.wpengine.com or request a demo.
Request a demo for an in depth walk through of the platform!