Distributed tracing is failing, how can we save it?

An image of a blue globe with a network of distributed tracing connections.
ACF Image Blog

Distributed tracing must be accessible to be useful to engineers. So how do we integrate it across observability platforms?

A man with short brown hair, wearing a light-colored shirt, stands outside with a wooden fence and greenery in the background, capturing a serene moment before diving into his work on distributed tracing to save systems from failing.
Joel Groen Product Manager | Chronosphere

Distributed tracing has been around for over a decade, and it arrived on the scene with great fanfare. There were promises distributed tracing would unlock the mysteries of large, complex microservices environments.

However, the early wave of tools were complex, and the majority of its value is centered on a small number of power users. This created workflow bottlenecks – and made it hard for casual users to understand just how beneficial distributed tracing can be.

Ten years in, we now know distributed tracing isn’t a silver bullet. To truly offer value, distributed tracing must get out of its silo and harness expert insight and deliver more “out of the box” functionality for casual users. It must be accessible across the entire organization and become integrated across the observability platform so users can solve problems in ways they couldn’t beforehand.

Distributed tracing isn’t useful siloed

Engineers, like all of us, are to some degree creatures of habit. The engineering organizations I’ve spent time with have a deep level of comfort with dashboards, and statistics show that’s where engineers spend the most time — they provide data in an easy-to-understand graphical user interface (GUI) for engineers to quickly answer questions.

However, it’s challenging when trace data is kept in its own silo. To access its value, an engineer must navigate away from their primary investigation to a separate place in the app, or worse, a separate app. Then the engineer must try to recreate whatever context they had when they determined trace data could supplement the investigation.

Over time, all but a few power users start to drift away from using the trace query page on a regular basis. Not because the trace query page is any less useful. It’s simply outside of the average engineer’s scope. It’s like a kitchen appliance with lots of uses when you’re cooking, but because it’s kept out of sight in the back of a drawer, you never think to use it — even if it’s the best tool for the job.

In the end, a handful of tracing power users take advantage of the trace query page while the rest slide back into solving problems with familiar tools. Engineers require tools that present specific data with actionable insights. Even with all of distributed tracing’s potential, it’s still not well-integrated enough into engineer’s toolkits for everyday use.

How do we make distributed tracing better?

To make distributed tracing better, it must be useful and accessible.

How do we make distributed tracing useful?

For engineering organizations to realize the full potential of trace data, engineers have to use it. Any calculation that attempts to measure the value of tracing is going to include some measure of breadth of use or the number of engineers who regularly use trace data.

The obvious answer to this question is to build a solution that has two different modes of operations: One that can meet the needs of both advanced power users and casual users.

Of course this is easier said than done. A truly valuable distributed tracing product must harness the workflows and insights of your expert users to deliver more “out of the box” value for the casual users. The ideal environment will offer a standard UI/UX across the organization, but enough customization options so power users can extract the insights they need.

How do we make trace data accessible?

Trace data can, and should be, integrated into workflows that matter to engineers whether that be making alert notifications more actionable (including where to route them) or providing service dependency data alongside dashboards responsible for that service.

Trace data isn’t valuable if it’s inaccessible. Every engineer in the organization must have easy access to trace data setup in tools they already work with on a regular basis. For a huge portion of software developers, that ends up being good old reliable dashboards.

Dashboards historically are the visualization of metric time series as line charts, gauges, and bar graphs. But engineering teams shouldn’t limit themselves to just displaying metrics; they can absolutely use dashboards to display insights from trace and other observability data types.

How to put trace data to work

By breaking down silos and bringing metric and trace data together under a single interface — such as dashboards or alerts — solutions are practical and effective; inclusive in the audience they cater to; bring together data in a meaningful way; and really take advantage of the real potential of distributed trace data.

We must treat dashboards like a tool in our observability toolkit that enhances other observability data, but also relies on other observability data to fully achieve its own potential. Dashboards can provide the foundation to integrate distributed trace data across the entire organization instead of keeping it siloed, only to be used once in a great while.

The reality is dashboards are a go-to spot for observability data. So why not break down silos by including metric and trace data together in a single view? By doing so, an experienced user or a new team member can interact with observability data in both simple and advanced ways, such as out-of-the box dashboards or advanced queries. With dashboards, engineers don’t have to learn an entirely new tracing tool, which means they can start using tracing data immediately.

Chronosphere’s observability platform brings together your metrics and traces data into one view so everyone gets the most value out of all available observability data. This makes it easier to connect an incident’s details, like location and performance, without needing to think of the underlying data types, product definitions or dead ends.

Want to learn about how Chronosphere can help implement distributed tracing? Watch a live demo now.  

Share This:
Table Of Contents

Ready to see it in action?

Request a demo for an in depth walk through of the platform!