In this episode of the Future of Observability video series, Chronosphere co-founder and CEO Martin Mao shares his insights with Technical Writer Chris Ward about the importance of central observability teams across an organization.
Chris: What is a central observability team, and why might companies need one?
Martin: Adopting cloud native means more than adopting a technology stack. There’s actually big organizational changes. Historically perhaps there were more siloed engineering and IT teams that supported a particular business unit, but as we start to standardize the infrastructure of the containerization layer, we’re seeing more and more central platform teams involved in that support—all of the business units across a larger enterprise. And observability is no different. Historically you’d have different sets of APM tools or IT monitoring tools in each business unit, but as companies migrate to cloud native, we see that there’s a clear pattern and need to centralize both the tooling and the knowledge base as well as the folks that are responsible for that.
Defining central observability
Generally central observability is a team that is responsible for offering observability as a service to rest of the company or the business units and ideally it would be in a central manner. If you can centralize the tooling, and have one tool set, that’s hugely advantageous. The skills and the interfaces of every end user can be consistent across an organization. Also, the need to jump across multiple stacks is much higher than it was ever before. So there’s a bunch of advantages in a single team being able to pick a single set of tools. But also that single team is able to do a lot more now that it is centralized. Because they’re also responsible for things like, “How do we instrument all of our stack?” This is where, as we’ve discussed in our earlier conversation about OpenTelemetry, adopting open source standards not only allow companies to future-proof themselves, but it means having a standard across the whole company.
How central teams can define responsibilities
Even beyond the standard instrumentations, there are a lot of things that you can push out centrally from one organization, that’s hugely beneficial to a company. For example, every engineer and developer needs to create an SLA (service level agreement) or an SLO (service level objective) for their microservice. But what is one team’s definition of an SLA? How do you really measure it? If that’s inconsistent—even if there was a top-down initiative saying every microservice has to have an SLA—if it’s measured from different sources of data, that creates huge amounts of confusion. There’s huge role that an observability team can play in that standardization effort, and is much higher than just the tooling that you provide. It’s how teams think about concepts like an SLA and how they standardize the emission of that data and perhaps even emitting that data on behalf of the service owner. There’s a lot to be gained by centralizing beyond the observability function—it can be things like security or for deployment or even the team that owns the core platform in terms of the containerization layer. There’s a huge advantage to centralizing all of that because you don’t really have to reinvent the wheel per business unit.
Central teams lead to connected insights
Chris: Central observability teams can potentially help bring connected insights to siloed teams that might not have the visibility to see the whole picture.
Martin: Yes, if the data is in the same toolset, then that enables connected the insights. If the data is produced in the same format, and you own one part of the stack— but you have a dependency on a different part of the stack—you can navigate to those dependencies. And if they measure things in the same way—if they have the same sources of data and have the same way of defining the SLAs—you can navigate that team’s data without having to directly communicate with them.
Other Future of Observability topics include high cardinality, the three phases of observability, and the future of PromQL, featuring Julius Volz. Make sure to subscribe to the Chronosphere YouTube channel so you don’t miss any future videos.