In this blog, software developer Nick Marcopoli walks through an internal project that was focused on converting Datadog queries, dashboards, and monitors to open source alternatives.
Nick Marcopoli is a Member of Technical Staff on the Rapid Response team at Chronosphere and previously was an engineer at Palantir. He currently lives in New York and enjoys running and biking through Central Park.
On: May 14, 2024
Vendor lock-in is a big concern for organizations as they select the right observability software for their cloud and cloud native environments. Even if technical decision-makers take the time to thoroughly evaluate their options, there is always a risk that they’ll outgrow the platform or business needs will change. Additionally, the time and money required for migrations often deter organizations from doing them.
We realized that even if an organization wanted an open source compatible, cloud native observability platform like Chronosphere, it might be a tough sell to switch, as migrations can be time-consuming and costly. We started by building ingest support for other non-Prometheus protocols, including StatsD and DogStatsD, and making this data available to query using Prometheus. But, that still left the problem of migrating queries, monitors, and dashboards. Smaller organizations may have a few hundred assets to migrate; larger organizations may have thousands.
This is why Chronosphere’s engineering team built an automated tool to make migration easier – taking some of the stress off the customer. To our knowledge, there is no tool available to help customers convert proprietary observability assets to open source standards. So we knew that building a tool like this would make switching to open source compatible solutions like Chronosphere a more manageable process and help them move to a more flexible, scalable, and future-proof solution.
With our general-purpose solution, we focused on three main areas: query conversion, dashboard conversion, and monitor conversion. These were not only the most useful for organizations, but also the most time consuming to migrate to open source standards.
Before we dive into our conversion solution, we need to understand how a computer understands a query language like Prometheus. Every language follows a grammar, and PromQL is no exception. The grammar tells us if an input to the language is syntactically correct. To illustrate this, let’s take a look at a contrived example of a grammar that we can use for adding or subtracting two numbers:
SUM = NUMBER + OPERATION + NUMBER
We must also define the inputs to the grammar that we’ve used:
OPERATION = [+-]
NUMBER = “-”? DIGITS
DIGITS = [0-9]+
Here, we describe an operation as either a plus sign “+” or a minus sign “-”. We describe a number as a set of one or more digits, optionally prefixed by a minus sign “-” to indicate a negative number.
Now, we can use our grammar to determine if an input string is correct with our new “adding or subtracting” language. Let’s look at a few examples:
String | Number | Operation | Number | Valid? |
---|---|---|---|---|
1 +2 | 1 | + | 2 | ✅ |
5- -3 | 5 | – | -3 | ✅ |
1+1+1 | 1 | + | 1+1 | ❌ |
7 | 7 |
🟥
|
🟥
|
❌ |
Great! This helps identify which strings are valid and which are invalid using our grammar.
The next thing we can do using our grammar is turn a valid input string into a tree, called an abstract syntax tree (AST). Here’s an example using the grammar we just defined:
We’re now able to easily interact with parts of the input as labeled nodes on the tree. Let’s work through an example of how we can use the tree to convert from one language to another.
Consider the following grammar, which is similar in functionality but has different features than our “adding and subtracting” grammar:
SUM = NUMBER + OPERATION + NUMBER
OPERATION = [+]
NUMBER = “-”? DIGITS
DIGITS = [0-9]+
You’ll notice that this new grammar only allows for addition. We can build a converter that accepts a string valid in our first language, then outputs a string with the same meaning that is valid in our new language. It may look something like this pseudocode:
convert(oldLangInput string) newLangOutput string {
// Parse() is a common function in parsing expression grammar libraries
ast = oldLang.Parse(input)
newAst = newLang.New()
newAst.Sum.LeftNode.Number = ast.Sum.LeftNode.Number
rightNumber = ast.RightNode.Number
if rightNumber < 0 {
rightNumber *= -1
}
newAst.Sum.RightNode.Number = rightNumber
newAst.Operation = "+"
// String() is a common function in parsing expression grammar libraries
return newAst.String()
}
By mutating the data contained in the old language’s AST representation of the input string and adding it into an AST from the new language, we’ve successfully created a general converter for converting from our old language to our new language.
Converting a vendor-specific query works similarly, though the implementation is not nearly as simple as our example. We build a grammar that represents Datadog’s query language from scratch, and we create a converter that will convert an AST representation of an input query string to a Prometheus AST using the open source Prometheus grammar.
From there, we can output that Prometheus AST as a valid Prometheus query that can be used with Chronosphere. We’ve done this successfully, not just with the Datadog query language, but with the Wavefront and other query languages as well.
Observability dashboards can generally be exported as JSON and are much easier to work with than queries. We first create a Go struct representation of a Datadog dashboard, based on an exemplar dashboard JSON. We then unmarshal a source dashboard from our customer into this struct, which allows us to interact with it programmatically.
Then, we can start converting fields from the dashboard to equivalent fields in a target dashboard format, such as Grafana or Perses. We leverage the query converter we described earlier for this part – any queries present in dashboard panels will be run through our query converter and output as valid PromQL. We’ll also convert any other common fields that might appear in both types of dashboards, such as the dashboard title, description, and any dashboard panel formatting.
Once we’ve converted all fields from our source dashboard struct into our target dashboard struct, we output the dashboard as JSON.
You’ll notice that most of the article focuses on converting Datadog to open source standards, but for monitors, we’ve decided to take a different approach. The open source solution for alerting is Alertmanager, which unfortunately is often cited as challenging to configure and manage.
In addition, the conversion from Datadog monitors to Alertmanager could be lossy as some Datadog features don’t clearly map to Alertmanager features, such as rich support for multiple conditions in a single monitor. We decided to migrate Datadog monitors directly to Chronosphere monitors instead.
Datadog monitors are similar to dashboards in that they also can be exported as JSON, which makes it simple to import into our conversion tool. Once we’ve built up a Go struct representing a source monitor, we can unmarshal a customer’s JSON monitor into it and interact with it programmatically. We’ll convert the monitor’s query using our query converter, and convert any thresholds and notification routing so that the customer gets alerted the same way they would when using their old observability tooling. We then output the conversion as Chronosphere monitor JSON.
Our tools typically get customers about 90% of the way to a full conversion, with the last 10% requiring manual intervention. This saves hundreds of hours of developer time during a migration. We’ve even done a full migration for a customer in approximately 4 weeks to meet their tight timelines.
Our tooling also provides value for customers who haven’t committed to a full migration to Chronosphere – customers can easily test Chronosphere and compare against their existing solution, as the same dashboards and monitors will exist in both platforms.
This initial project helped provide a framework for specific types of proprietary data, but it’s just the beginning. Going forward, we plan to continue improving our tooling to make the transition to Chronosphere even easier.
Curious to learn more about how Chronosphere and Datadog? Check out these resources: