Having best practices for OpenTelemetry attribute naming helps developers get the most value out of their data and avoid extra confusion.
On: Dec 20, 2023
When it comes to using OpenTelemetry (OTel) distributed tracing data, simply collecting it isn’t enough; you need to have practices in place to make sure the data is easy to find and correlate with other data. That’s the goal of having good attribute naming standards.
Effective attribute naming is not just a best practice; it’s a critical requirement. For data to be valuable in troubleshooting and post-mortems, attribute names need to be consistent across every telemetry type, every tool and every service. Without this uniformity, the usefulness of your OTel data is significantly reduced.
Semantic conventions and best practices for OTel make data more connected, more portable and more usable throughout your cloud native environment. Contextual data is the most beneficial type of data for observability teams, and best practices ensure you can maximize data usage and effectiveness.
These guidelines and best practices will help position your organization to get the most benefit from collected tracing data.
To implement effective and useful OTel attributes, it’s crucial to involve all affected teams early in the process. To have a successful adoption, you should consider conducting workshops to get everyone on the same page on the positive outcomes that come from having a clear and consistent naming standard across all layers of the stack. Consistency creates clarity, which is crucial during incident response and debugging. Get buy-in from software and systems architects by illustrating the benefits of a naming standard and focus on areas that are unique to your company and applications.
Then draft a detailed document that outlines the naming conventions, including syntax, structure, and examples. Devise a process for modifying the standard, improving it through feedback and addressing any gaps that you find after the fact.
There are five main best practices that you can use as part of your OTel attribute naming conventions to get the most out of your observability data:
Semantic names help ensure efficient root-cause analysis.
With descriptive attribute names, you can easily look at resources and have all the necessary context to know what they are, what they include, and what they relate to. For an excellent explanation of the existing semantic conventions, visit the official spec, where you can learn the General and System attributes as well as find them organized by signal or operation type (like HTTP or Database), including technology-specific conventions.
The practice of creating a library of known attributes helps you catalog the data you care about, and their documentation creates a record of the data that is important to your customers.
When multiple teams will be sharing attributes, it is important to standardize them to avoid discrepancies. Discrepancies in attribute naming conventions across teams can make correlating data difficult or outright impossible. For example, if the backend team names latency as `latency`, but the frontend team names it `duration`, queries to compare or aggregate latency across services won’t work properly. Standardized attributes enable teams to leverage shared resources (think dashboards or alerts), and allow you to draw insights across multiple systems and services.
Occasionally you might need to create a new attribute for a specific aspect of your company or application. Before you do though, it’s a good idea to consult the OpenTelemetry Attributes Registry to be absolutely sure one doesn’t already exist for what you need. Once you confirm there isn’t one that matches what you need, you can create a new one. It’s important to follow the tips in the OTel Attribute Naming guide, especially regarding the use of prefixes.
Prefixes in attribute names help in distinguishing your custom attribute names from the standard names, names chosen by other projects, vendors or companies that you work with. If a custom attribute accidentally shares a name with another attribute, it can lead to incorrect conclusions and decisions, faulty dashboards and alerts, and make it challenging to track the flow or state of transactions accurately.
To avoid conflicts with other projects, vendors or companies, it is wise to consider using a prefix based on your company’s domain name, in reverse, like `io.chronosphere.myapp`.
If you are absolutely sure the name will never be used outside the confines of your application and only inside your company, prefixes are still essential for preventing collisions. Consider using a prefix name associated with your app or project, like `bluebook.widget_count`.
You might be tempted to piggyback on an existing prefix that belongs to OpenTelemetry or another project or vendor. Sharing prefixes can result in a name clash down the line, leaving you and your peers struggling to find ways to separate someone else’s data from your own during an incident.
When deciding what attributes to apply to your traces, remember that your application’s focus is to provide a high-quality software experience to customers. This mission is encoded into your service/application’s service level objectives (SLOs), maybe in the form of a 99.999% uptime expectation. From the SLO, you can narrow down which service level indicators (SLIs) best support or are most likely to threaten achieving SLOs. Your attributes should support your service levels.
For example, if you have latency SLOs that differ between segments of traffic, using attributes that provide segment dimensionality like ProductID, FeatureID or RegionID can help you organize alerts accordingly.
Think of attributes as the root source of pattern-matching in a distributed system. If you want to investigate relationships across and between categories, attributes are the vehicle for sorting and comparing.
Incrementally experiment with different attributes and see what shakes. Let’s consider an example.
Are your premium customers contacting support about an invoice error? Didn’t the Order service deploy a new build a few minutes ago? Correlating an attribute, such as `service.version` and `membership.level` against an error metric for `service.name:order` could help identify if the elevated error rates for premium members are highly correlated to the new version of the order service.
A great deal of careful consideration has been put into the development of the standard attributes for OpenTelemetry, and this list is constantly evolving. Although there are more categories than can be mentioned here, it can be useful to explore what exists when building your internal naming standards, and call out what would be useful to teams when investigating regressions. Here are a few examples from the registry:
There is one special kind of span attribute called the span event log that often gets overlooked. Span events are very similar to logs, but they are a great place to put contextual information that could be useful when troubleshooting a problem with a transaction.
When thinking about what might go in a span event log, you should clean up any payload of private user data/ Add any events that are happening within the span, include a shorthand summary of what occurred, any exceptions or full error messages and additional context.
We’ve been focusing on the “do’s” of attributes, but here is a closer look at some attribute pitfalls to avoid:
There are many more useful insights and recommendations in the OpenTelemetry documentation, so it’s a good idea to check the latest spec when working on your attribute standards.
Trace data collection is a necessary part of observability. But it requires processes in place to ensure the data is useful, accessible and insightful. Naming conventions take upfront work, but by embracing these best practices — from ensuring semantic clarity and maintaining a unified library to understanding data, aligning with service levels and anticipating new use cases — your team can elevate the utility of your telemetry.
This approach doesn’t just streamline troubleshooting, it helps you build an effective culture of observability within your organization. The result of this work is a rich OTel data set full of accessible insights, enabling smarter, quicker decision-making.
Interested in how you can make the more out of your OTel data? Contact us for a demo.
Request a demo for an in depth walk through of the platform!