For Ashley, the Director of Observability at a multi-national electronics manufacturer, a lack of buy-in and understanding about observability within her organization is threatening to derail the organization’s observability initiative. Her company recently adopted a modern observability tool after moving from monolithic enterprise applications running on virtual machines to distributed, containerized microservices with independent release cycles. The company’s goal in adopting cloud native observability was to ensure application stability and positive customer experiences. However, the company’s lack of consensus around observability has left engineers, and ultimately customers, worse off than before.
Ashley is a fictional character in a made up vignette meant to illustrate how a culture of observability, or lack thereof, can make or break a company’s success. What’s happening in Ashley’s world is actually happening in organizations, worldwide, every day. When observability culture is dysfunctional, product launches are bumpy, or even fail — this is because key systems go unnoticed and visibility is limited.
The need for comprehensive instrumentation across the board is obvious: Rather than spending time innovating and finding new ways to make customers happy, teams of engineers are focused on troubleshooting issues.
This post uses a real world example of a Sev1 incident at Ashley’s organization to give you an idea of how bad life gets when an observability tool isn’t being used to its greatest effect, and how a culture of observability can help. After building out a hellscape vignette in which a difficult-to-resolve incident threatens to delay a major product launch, I walk you through:
- How organizations can build a culture of observability,
- The importance of stakeholder engagement
- and actionable steps to ensure observability becomes a foundational element of your development process
I also sprinkle in some insights along the way from Bill Hineline who has seen a thing or two during his x years as Director of Enterprise Observability at United Airlines. Bill is currently Field CTO at Chronosphere.
Now let’s check in on Ashley…
After snarfing down a quick lunch (or is it dinner now?) Ashley rejoins the Incident Response Zoom call to check in with the team. As the Director of Observability, Ashley has a big stake in having this incident resolved quickly. The product launch is supposed to happen in two days, but this current bug has broken a key part of the revenue flow through the application.
The discussion on the call revolves around a lack of visibility into a couple of key microservices — ShoppingCart and CheckOut.
Knowing that the observability tool needs correct instrumentation in order to help diagnose an issue, Ashley asks “Did we put in the instrumentation for the metrics and tracing telemetry for those services?” She is looking directly at Constantine, the lead developer for one of the services.
“Are you kidding?” Constantine asks. “We barely had time to get the core code written. Writing in instrumentation has always been lower on our priority list.”
There is tension in the room as pressure mounts to quickly resolve the issue and keep the product launch on track, despite having no idea what the cause is.
“But this call proves exactly why we need to have that instrumentation. We have an observability platform that can give us visibility to troubleshoot the problem, but only if everything is instrumented. Distributed tracing doesn’t work if there are holes in the data” says Ashley.
Ashish, the VP of Infrastructure who has been off camera and silent until now, comes on camera and says “To be honest, we need to find a better and faster way to resolve these incidents. This observability platform is costing us an arm and a leg and it clearly isn’t providing the value we need. Let’s pull over some more engineers from the Phoenix Project and get more eyeballs on this problem. We have to get this fixed ASAP.”
Ashley kills her camera and puts her head in her hands. “They just don’t get it,” she thinks to herself. And opens up her resume.
What went wrong?
Ashley struggles with the fact that her team members and stakeholders don’t seem to understand the value and purpose of observability: Constantine sees observability as something that takes time vs saving time. Ashish sees it as a cost center rather than value center. They both want to continue doing things the familiar way.
Ashley is not alone. We see this often in organizations of all sizes.
For observability to really provide the benefit it promises, it needs to have buy-in from every part of the organization. From development, to operations, to leadership, there needs to be a common understanding of how observability can help and why it is important. A culture of observability needs to permeate the organization.
What is a Culture of Observability?
Fundamental to the success of an observability strategy is the ability to establish a culture of observability within the broader organization. A culture of observability can be characterized by three traits:
1) Shared Responsibility and Buy-in:
A successful observability strategy requires fostering a culture of shared responsibility for observability across all teams. By embedding observability throughout the software development lifecycle, organizations create a proactive environment where issues are detected and resolved early. This will require observability buy-in across all teams within the organization.
2) Promoting Transparency:
Teams that prioritize observability gain deeper insights into system performance and user experiences, resulting in faster incident resolution and improved service delivery. Promoting an organizational mindset that values transparency and continuous monitoring is key.
3) Shifting Left:
Shifting observability left into the development process helps teams catch issues earlier, reducing the cost of fixing bugs and enhancing product quality. Developers can integrate observability into code from the outset, ensuring systems are instrumented and monitored at every stage. This is a key step towards the establishment of a culture of observability.
You can’t bolt on observability at the end and expect it to work. A culture of observability means it’s baked into how we build, how we operate, and how we think about system health — from the first line of code to production.
- Bill Hineline, former Director of Enterprise Observability for United Airlines
Identifying Stakeholders and Organizational Ownership
As a key step on the journey of building a culture of observability, you need to make sure that you have organizational buy-in for the effort and the resulting strategy.
Some questions to ask yourself in this regard are:
- Who is responsible for the observability strategy?
- Who owns observability execution?
- Is there a Central Observability Team, or is responsibility distributed to business units or project teams?
- Where does it sit? Within Platform Engineering? SRE? Ops? Elsewhere?
- Next you need to consider (and involve) all the key players and stakeholders across the organization that need to buy in and be involved.
This should include:
- Executive sponsorship
- Engineering leadership
- Key users, including those that are experts and those that may only rarely use it
- Administrators of the observability tooling
- Procurement and finance because cost can become an issue
- Any other business units that may want a seat at the table
A big part is making sure that all the stakeholders across the organization, high or low in the org chart, understand what’s going on, and it means taking feedback. Leadership needs to be involved — this means communicating what you are doing, why you are doing it, what the implications are of doing or not doing it.
In addition, you need to identify who will be your champions and who may be detractors. Both groups are equally important to your observability strategy success and demand equal attention.
- Detractors can overtake the narrative and reduce the buy-in you’re trying to achieve.
- Champions, however, can help win over additional stakeholders, users and even detractors to increase buy-in for your strategy.
Make sure that champions are heard and simultaneously use their power to help the team, at a minimum, accept a change is coming and understand what you’ll need from them. In addition, you can use champions to win over detractors by having them showcase the benefits of the observability strategy — this can happen in individual conversations and to the team more broadly.
If observability is treated like a niche tool or a siloed team’s problem, it will fail. Success comes when leaders across engineering, product and infrastructure recognize that observability is foundational to delivering great customer experiences – and commit to owning it together.
- Bill Hineline, former Director of Enterprise Observability for United Airlines
Choose the right observability solution for your cloud native stack. Download A Buyer’s Guide to Modern Observability now!
Fostering observability — make it easy
An important role of the Central Observability Team (or SRE Team depending on organizational structure) is to make observability as easy to adopt as possible. Providing tooling and enablement for the rest of the organization can help overcome resistance and blockers and foster a culture of observability.
Some examples of things that can smooth the path to organization-wide acceptance and adoption can include:
- Starter kits for basic OpenTelemetry (OTEL) instrumentation (an OTEL Wrapper for developers to call, for example)
- Templates for common technologies that will be monitored (i.e. template dashboards for KAFKA, etc.)
- Best practice guidance / standardized instrumentation approach with room for extensibility by engineers
- Overall O11y governance and a process for feedback and updates
The Central Observability Team doesn’t need to be the owner of the work required to effectively implement observability, but they should be enablers of that work to make life better for the other teams.
As critical as it is to get developers to embrace observability by instrumenting their applications, it is also critical to have executive sponsorship that is supportive and vocal about their support. Having leadership recognize wins that were made possible by observability goes a long way towards building a culture of observability.
Building a culture of observability isn’t just about vision – it’s about enablement. You have to meet teams where they are and make the right path the easy path. If instrumentation feels like extra credit, we’ve already lost.
- Bill Hineline, former Director of Enterprise Observability for United Airlines
Bringing it full circle
Fostering a culture of observability is not just about implementing the right tools, but about aligning teams and stakeholders toward a shared understanding of its value. By prioritizing visibility, engaging key players, and embedding observability practices early in the development process, organizations can ensure smoother product launches, quicker issue resolution, and better overall system health. With the right approach, observability becomes a cornerstone of both operational excellence and continuous improvement. And for Ashley, less stress.
There’s much more to discuss when it comes to your observability strategy. In my next article, I’ll be writing about how to enable and drive observability adoption.
5-Step Practical Guide to Upgrading Your Observability
Migrating to a new observability platform is a major project — this guide provides guidance during the migration process to mitigate the challenges.