There are five main best practices that you can use as part of your OTel attribute naming conventions to get the most out of your observability data:
1. Use semantic and descriptive attributes
Semantic names help ensure efficient root-cause analysis.
- Make sure your attributes are clear, descriptive and apply to the entirety of the resource they describe. Names like `http.status_code` and `db.system` are easy to identify and provide immediate insights into the nature of a problem, whether it’s in the database or a web service.
- Non-semantic names like `attribute`, `info`, or `session_data` are too generic and lead to confusion when analyzing telemetry data later on.
- Example: `app.service.version`
- Define namespaces for your attributes. This is especially important when multiple service teams have their own standard attributes.
- Example: `app.component.name`
- Keep attribute names short and sweet
- Set error attributes on error spans
With descriptive attribute names, you can easily look at resources and have all the necessary context to know what they are, what they include, and what they relate to. For an excellent explanation of the existing semantic conventions, visit the official spec, where you can learn the General and System attributes as well as find them organized by signal or operation type (like HTTP or Database), including technology-specific conventions.
2. Use a shared library
The practice of creating a library of known attributes helps you catalog the data you care about, and their documentation creates a record of the data that is important to your customers.
When multiple teams will be sharing attributes, it is important to standardize them to avoid discrepancies. Discrepancies in attribute naming conventions across teams can make correlating data difficult or outright impossible. For example, if the backend team names latency as `latency`, but the frontend team names it `duration`, queries to compare or aggregate latency across services won’t work properly. Standardized attributes enable teams to leverage shared resources (think dashboards or alerts), and allow you to draw insights across multiple systems and services.
3. Create custom attributes
Occasionally you might need to create a new attribute for a specific aspect of your company or application. Before you do though, it’s a good idea to consult the OpenTelemetry Attributes Registry to be absolutely sure one doesn’t already exist for what you need. Once you confirm there isn’t one that matches what you need, you can create a new one. It’s important to follow the tips in the OTel Attribute Naming guide, especially regarding the use of prefixes.
Prefixes in attribute names help in distinguishing your custom attribute names from the standard names, names chosen by other projects, vendors or companies that you work with. If a custom attribute accidentally shares a name with another attribute, it can lead to incorrect conclusions and decisions, faulty dashboards and alerts, and make it challenging to track the flow or state of transactions accurately.
To avoid conflicts with other projects, vendors or companies, it is wise to consider using a prefix based on your company’s domain name, in reverse, like `io.chronosphere.myapp`.
If you are absolutely sure the name will never be used outside the confines of your application and only inside your company, prefixes are still essential for preventing collisions. Consider using a prefix name associated with your app or project, like `bluebook.widget_count`.
You might be tempted to piggyback on an existing prefix that belongs to OpenTelemetry or another project or vendor. Sharing prefixes can result in a name clash down the line, leaving you and your peers struggling to find ways to separate someone else’s data from your own during an incident.
4. Focus on service levels
When deciding what attributes to apply to your traces, remember that your application’s focus is to provide a high-quality software experience to customers. This mission is encoded into your service/application’s service level objectives (SLOs), maybe in the form of a 99.999% uptime expectation. From the SLO, you can narrow down which service level indicators (SLIs) best support or are most likely to threaten achieving SLOs. Your attributes should support your service levels.
For example, if you have latency SLOs that differ between segments of traffic, using attributes that provide segment dimensionality like ProductID, FeatureID or RegionID can help you organize alerts accordingly.
5. Think about new use cases
Think of attributes as the root source of pattern-matching in a distributed system. If you want to investigate relationships across and between categories, attributes are the vehicle for sorting and comparing.
Incrementally experiment with different attributes and see what shakes. Let’s consider an example.
Are your premium customers contacting support about an invoice error? Didn’t the Order service deploy a new build a few minutes ago? Correlating an attribute, such as `service.version` and `membership.level` against an error metric for `service.name:order` could help identify if the elevated error rates for premium members are highly correlated to the new version of the order service.