How Chronosphere architects with Google Cloud

on January 5th 2023

Matt Schallert, a seasoned software engineer at Chronosphere and colleague, recently spoke with Google Cloud Tech Developer Advocate, Kaslin Fields, to share his perspective about why Chronosphere chose Google Cloud for our cloud native observability platform. This blog covers key points of their conversation. 

An early decision: Go cloud native with Google Cloud

Chronosphere is a software as a service (SaaS) observability company built for cloud native environments to help other companies solve some of the same problems that our founders — Martin Mao (CEO) and Rob Skillington (CTO) — saw when leading the observability team at Uber. 

Our cloud native observability platform ingests both metric and trace data in a variety of open source formats, such as Prometheus and OpenTelemetry, and allows companies to troubleshoot and remediate issues more quickly. With Chronosphere, our customers spend less time managing their observability systems and more time building what differentiates them as a business. 

There are three key aspects of our business and our product that informed how we decided to build on top of Google Kubernetes Engine (GKE). 

Cloud native mindset. When Martin and Rob launched Chronosphere, one of the first considerations was this: To solve a cloud native problem and work with cloud native companies, we obviously had to be cloud native ourselves. Plus, we had a commitment to openness. That meant building our platform on a public cloud.

Observability as a mission-critical service. We knew our customers would be using Chronosphere to take care of their mission-critical systems. We had to offer them the strongest service level agreements (SLAs) for availability, reliability, and performance. 

Data onslaught. Our customers would be emitting huge volumes of data that they would have to sift through in near-real time. 

A closer look at our Chronosphere architecture

The Chronosphere collector — our vendor-neutral way to collect, process, and export telemetry data (e.g. metrics and traces) to various destinations — is Kubernetes native. When you install our collector in a Kubernetes cluster, it automatically discovers all your workloads and starts to gather data from them.

The collector tailors the experience to each user based on what services that individual owns or what teams that person belongs to. This makes the overall user experience friendlier and less overwhelming while helping people get to the bottom of issues more quickly. Being on-call for engineering teams is already stressful enough. You don’t want to have to search for a needle in a haystack — you want your tools to find that needle for you. 

Why Google’s zonal separation is essential to our architecture

Another important aspect of GCP that is integral to our architecture is its multiple zones in each geographic region. This allows us to take advantage of zonal isolation. This is how we define our applications within Kubernetes to make sure they are evenly distributed across zones. 

It also ensures that customer data stores are isolated from each other and can tolerate zonal failures. Leveraging GCP’s high availability and zonal isolation, and expressing those requirements with Kubernetes, has allowed us to set a really high SLA as well as have the best performance against that SLA in the industry. (Zoom to the 12:45 mark in the video below to hear Matt dive deeper into zoning).

Because latency matters, we’re very concerned about data freshness. We leverage Google Cloud load balancers, specifically its global load balancers, for getting traffic to and from our customers’ users. With this set up, the data that those users are sending us is captured or sent from them as close to them as possible. Plus when they query data from Chronosphere, it’s getting to them as fast as possible.

GKE also supports data persistence. When customers send us data, we don’t consider that data to have been persisted until data stores in at least two out of three zones within a region have acknowledged it. This is where it’s critical to be able to tell Kubernetes that we need certain workloads or workload subsets to be in certain zones. GKE ensures that that information is propagated all the way down to the data-store layer by making sure that our persistent disc volumes are in those zones as well. This separation of the applications and their data is a critical part of our Google Cloud architecture.

How GKE helps us achieve our SLAs and SLOs

Google Cloud also helps us with availability. Because we leverage Google Cloud’s high availability in zonal isolation, when our customers load Chronosphere, they see data that’s coming from their environments almost immediately. This is important when debugging a critical issue in real time, especially if the tool you’re using doesn’t access the data you need immediately. 

We call this “time to glass,” which is the measurement of how long a data point takes from being emitted to being visible to a user. We try to keep that to just a few seconds because otherwise the data becomes almost useless very quickly. 

Future architectural goals

We’re always trying to improve the things that differentiate us: performance, reliability, or cost efficiency. We continue to invest in these three areas, making sure our platform can withstand all kinds of failure scenarios. 

  • Performance – We will continue to deliver the best performance for cost in the industry. Look for continual improvements in this area.
  • Reliability – We will track what new storage and/or networking technologies Google comes out with and ensure that we leverage those properly to give our customers what they need. 
  • Cost – With teams implementing observability platforms concerned about how they store all the data they need at a price that they can manage — and then actually what they need within it — we will do all this for them at the lowest possible cost.

Simply the best: our Chronosphere observability platform on Google Cloud

Because we’re an open platform built on GKE, teams can get onboard easily while avoiding vendor lock-in. Then we put them in control of how much data is ingested, how it’s ingested, how it’s stored, and for how long. That way, our customers remain in control of their observability costs while their systems — and data — continue to grow. 

Chronosphere recently joined Google Cloud Marketplace (GCP) as a solution integrated with Google Cloud, which means customers can transact through the marketplace toolbox. Check us out on Google Marketplace or visit our website to learn more about our platform.

Watch the full video here:

Interested in what we are building?