How DoorDash is scaling their SLOs with Chronosphere

Copy of Green Technology Image preview card (1)
ACF Image Blog

DoorDash’s Software Engineer on the Observability Team, Steven Callister dove into how DoorDash was able to automate 14,000 service level objectives, and drive better business value for their customers.

Parker Trewin | Head of Corporate Communications | Chronosphere

Parker Trewin is the Head of Corporate Communications at Chronosphere.

4 MINS READ

When online food-ordering and delivery service DoorDash first started their observability journey, their engineering team was battling constant metrics loss while scaling, and their monitoring system kept breaking down.  It was time for a new solution, and DoorDash had 4 main pieces of criteria: Finding a platform that was open source, scalable, reliable and fully-distributed.

Once DoorDash was introduced to Chronosphere, they knew they had found their match. In a recent video, DoorDash’s Software Engineer on the Observability Team, Steven Callister dove into how DoorDash was able to automate 14,000 service level objectives, and drive better business value for their customers.

Check out a transcript of the video below, and catch the full video at the end of this blog.

The power of service level objectives

Steven: Chronosphere has allowed us to scale up to 14, 000 SLOs now in our SLO framework, and that is our service level objective system. It’s how we make sure that services are healthy. That has been incredibly helpful for having complete and total coverage of our endpoints. It means we’re not flying blind. It means that as a business, we’re able to see how all of the interactions that customers have are occurring.

My name is Steven Callister. I’m a software engineer at DoorDash and I work on the observability team. DoorDash connects millions of consumers and dashers who are delivering food with hundreds of thousands of restaurants around the world. A good SLO helps you know if your service is healthy or not and it helps you understand as a developer or as a company where you need to focus your attention.

Video Thumbnail

How DoorDash finds control over data with Chronosphere

Steven: So, we use Chronosphere’s recording rules in order to record data in a deterministic way to keep track of how healthy services and their endpoints are. If we have high SLO burn for a service, then that could lead to a consumer not getting their meal in time or experiencing errors in the app. That could lead to a merchant not getting a food order on time, right, or at all.

And so, it’s important that most of the service calls between our different services are succeeding. That way, people have a good experience on the site. What we’ve seen as we’ve grown, is that it’s difficult for developers to have complete and total coverage for all of the services that we have. 

And so, some of the work that I’ve been doing this year is to automate the creation of SLOs. So that way, instead of creating SLOs by hand, developers are instead reviewing automatically created SLOs, and then adjusting them and fine tuning. So, developers get more time back, because they’re not spending as much time hand crafting SLO creation. We also get better SLOs because it gets rid of the human error element, because we’ve done it automatically.

Video Thumbnail

Automating SLOs for business value

Steven: And then, we do get more SLOs, which is great because that means we get better SLOs, endpoint coverage. So, because of automated SLOs, that frees up our developers to have more time to work on developing the product itself. That also means that it frees up developers to have more business related conversations with the upstream and downstream services from them.

One of the things I like to say is that SLOs should lead to discussions around business value. If you have an SLO that’s performing poorly, then that could be a good indication that you need to have a business conversation. Ultimately, having good SLOs leads to a better business and a better customer experience.

It’s not just Chronosphere serving data up to us. They partner with us, they’ve had these problems before, and it’s a large collaborative effort between us and them.

Video Thumbnail

Hear first hand from DoorDash

How DoorDash is driving better business value through automated SLOs

Share This: