The mission of OpenTelemetry is to “enable effective observability by making high-quality, portable telemetry ubiquitous” and covers data creation, collection, management and export.
Metadata plays a key role in making observability data meaningful by offering developers more dimensions to explore. The more dimensions that describe your telemetry, the deeper questions you can get answered.
There needs to be clear and consistently adhered to rules about the structure and contents of metadata — otherwise it will be frustrating and challenging for developers to correlate behavior across applications, cloud providers, and signal types.
Establishing effective adoption for OTel attributes
To implement effective and useful OTel attributes, it’s crucial to involve all affected teams early in the process. To have a successful adoption, you should consider conducting workshops to get everyone on the same page on the positive outcomes that come from having a clear and consistent naming standard across all layers of the stack. Consistency creates clarity, which is crucial during incident response and debugging. Get buy-in from software and systems architects by illustrating the benefits of a naming standard and focus on areas that are unique to your company and applications.
Then draft a detailed document that outlines the naming conventions, including syntax, structure, and examples. Devise a process for modifying the standard, improving it through feedback and addressing any gaps that you find after the fact.
In the case of OpenTelemetry, metadata comes in the form of key-value pairs called attributes that describe the entity producing data (Resource) or the data itself as defined by the Semantic Conventions. Think of attributes as the foundational support for telemetry allowing you to organize your data for effective comparison and analysis across components in your system.
These open standards have been adopted across the industry from cloud providers, vendors, and projects alike so the telemetry emitted from existing instrumentation libraries and integrations will already follow these semantic conventions. However, this out of the box instrumentation is intended to cover common use cases across the industry, not context that is specific to your application or company.
Adopting and extending this standardization across your company will result in rich observability data that is connected across telemetry types, system boundaries, and more effective and efficient investigations and analysis.
Example
Let’s say a particular blog page got a sudden surge of traffic. We would want to look at quantitative data measuring how many requests came in over time in order to compare with historical data. Is this just a popular post that made it on a forum or is this number anomalously suspicious perhaps indicating a DoS attack? This comparison would help answer that question.
Adding qualitative information to describe these same requests can reveal more information — like adding the source of traffic via referrer from HTTP headers. That extra dimension allows you to get a sense of where these requests are coming from and makes that same data set of web requests more valuable.
The reason this works is because the structure and naming of HTTP headers is standardized across the industry — every web request adheres to it.
3 Ways to get more from OTel Attributes
Here’s how to help your organization get the most benefit from observability data by standardizing metadata
1. Understand Semantic Conventions
Semantic Conventions are the common naming scheme for metadata that is standardized across the OpenTelemetry ecosystem. The specification defines key-value pairs called Attributes which can either describe traits specific to a signal like HTTP attributes for spans or describe the entity emitting the telemetry, also referred to as a Resource.
Think of semantic conventions as a style guide for your telemetry which has a few rules you are required to follow, many rules that are recommended, and some that are more of a suggestion.
Here is an overview of the guidelines for Attribute Naming in the spec:
Must Be
- Valid Unicode sequence, specifically limited to these Unicode code points (basic Latin characters, Numeric, Underscore, Dot)
- Unique – even if a particular attribute name is retired or renamed, the old one can never be used again
Should Be
- All lowercase
- Use namespaces to avoid name clashes
- Delimited by a dot character
- Use snake case for multi word components (example: http.response.status_code)
- Singular if attribute represents a single entity (example: host.name, container.id)
- Plural when attribute represents multiple entities with value type array (example: container.image.tags:[“v1.27.1”, “3.5.7-0”])
- Names should not clash with namespaces (example: cannot add service.instance as attribute if service.instance.id is already an attribute name)
Allowed
- Nested namespaces (example: telemetry.sdk.name, telemetry.sdk.lanugage, telemetry.sdk.version are names nested under the telemetry namespace)
- Abbreviations if widely recognized and common (example: K8s as kubernetes URL for uniform resource locator, IP internet protocol)
2. Adding custom attributes
OpenTelemetry offers a thriving ecosystem of instrumentation libraries and integrations that produce telemetry enriched with standardized metadata. However, there will be times it is necessary to introduce new attributes that are specific to your system or organization.
These custom attributes offer additional dimensions to categorize, filter, and monitor your telemetry by, such as product_id, team_owner, or plan_tier. Since these fields are only relevant within the context of your organization they need to be manually instrumented by your developers.
The main requirement to keep in mind when adding custom attributes is that names must be unique. If there is an attribute name clash in your telemetry, there is a risk of introducing errors in dashboards and alerts, drawing incorrect conclusions when investigating or inaccuracies in tracking the flow or status of transactions.
Here are the do’s and don’ts for adding a custom attribute
Custom Attributes Do’s
- Do decide on an appropriate prefix it recommended to prefix with your companies reverse domain name (io.chronosphere.attribute_key) or with the specific application name like (bluebook.widget_count)
- Do check Attribute Registry to ensure your proposed name or namespace isn’t already in
- Do ensure the name follows Attribute Naming guidelines
- Do consider submitting a proposal to add a new attribute name or namespace if it is broadly applicable to the industry (ex: GenAI and LLMs and CI/CD latest additions)
Custom Attributes Don’ts
- Don’t use the otel. Namespace – this is reserved for OTel specific concepts, like otel.scope.name
- Don’t use existing semantic convention namespaces for custom ones
- Don’t reuse attribute keys – duplicate key names can cause collisions and result in overwriting data
- Don’t add an attribute if there are unset/empty values – these are unhelpful and consume storage without providing value and can skew totals when analyzing
- Don’t create an attribute unless you know how you’re going to use it/that you have a use for it
- Don’t put sensitive information in attributes!
As an example, let’s say your organization is adopting FinOps, and, in order to properly allocate costs, there needs to be a way to associate a given application and its resource usage with the owning team and department.
It would be tempting to add service.owner.team and service.owner.dept as attributes on container metrics for CPU, memory and disk utilization. The problem is there is already an existing service namespace for Resources so you can either think of a new namespace that is distinct or submit these attributes for consideration to the official Resource Semantic Conventions.
Another option is creating namespaces that are unique to a particular application, like my_application_name.owner.team and my_application_name.owner.dept. This would eliminate name collisions with OTel attribute namespaces, but is too narrowly specific. It would require the FinOps team to somehow keep an up-to-date source of application names in order to be able to query and report on this.
Taking a step back, teams and departments are constructs of your company and the most suitable option is to create a corresponding company namespace. OTel suggests reversing the domain, so for chronosphere.io -> io.chronosphere. While logically teams and departments represent different levels of ownership, the purpose of namespaces is to avoid name collisions, not to reflect entity hierarchies. This means adding a nested namespace like io.chronosphere.owner is not necessary.
This leaves us with adding io.chronosphere.team and io.chronosphere.dept, which will let the FinOps team quickly and easily query for application resource usage correlated with owning team and department!
The big takeaway is that attribute names must be unique across the OpenTelemetry universe.
3. Develop A Data Dictionary
OpenTelemetry takes care of namespacing to avoid name collisions for out of the box instrumentation from SDK’s and integrations but implementing and enforcing standardization across custom attributes is the responsibility of your company. Without getting alignment on custom attributes and an enforcement mechanism, in addition to name collisions there could be attribute name sprawl.
The first step to take is developing a data dictionary to standardize custom attributes across the company, and making sure to collaborate with everyone who relies on this telemetry from technical support, security, operations, product, and engineering. This will help identify a comprehensive set of attributes everyone can benefit from.
Afterward these attributes can be codified in a shared library, making it easy for teams to augment their existing instrumentation and stay in compliance — even as fields get renamed or added.
See this example in which a company offered 3 different tiers of plans with varying levels of performance assurances:
- Gold Tier – queries return in 4 seconds or less
- Silver Tier – queries return in 10 seconds or less
- Bronze Tier – queries return in 1 minute or less
In order to monitor and ensure your system is delivering on these agreements there needs to be a way to facet request latency by plan tier. Telling teams to add plan tier as a tag without providing guidelines or conventions will result in name sprawl – where the mobile team uses com.my_company.plan while the desktop team uses com.my_company.plan.tier and the authentication team uses com.my_company.plan_tier.
All teams technically met the requirement to add plan tier as an attribute, but without standardization, there is a high burden on anyone investigating high latency or setting up monitors to find and know about all the permutations of the plan tier attribute.
Adopting Attribute Standards
It’s a given that the out-of-the-box attributes produced by OpenTelemetry libraries and integrations will follow the Semantic Conventions. However, in order to fully realize the benefits of standardized metadata, your organization will need to define and manage semantic conventions internally. Here are a few phases to introducing standards in your organization
Speak to the Pain
This initiative may not seem exciting on the surface so communicating how this work will solve pain points and result in more effective and efficient investigations for everyone is important to secure buy-in.
In cloud native systems, it isn’t enough to know your way around the data coming from services your team owns. With interdependent and distributed microservices you need to be comfortable investigating areas you don’t own whether they are upstream, downstream, underlying infrastructure, or even third party integrations.
It can be frustrating and difficult to get answers to your questions when looking at other team’s data if they use synonymous attributes or do not have fields that you expect to be in place. Great metadata can be the difference between an aha moment or hitting a dead end in an investigation.
The role of internal standards is to ensure that no matter what part of the system you’re looking at there will be consistently named attributes that are relevant and meaningful to your work.
Inventory Existing Attributes
Take a look across existing telemetry and identify all of the custom attributes in use today — find out where they’re coming from and highlight any redundancies or attributes that can be deprecated.
Share this document out widely and openly before calling any meetings to sync or align, this will give folks time to read through on their own and develop proposals for changes.
Collaboration & Feedback
After sharing out the attribute inventory, work in collaboration across the company from engineering to support to security and operations to get feedback about which attributes to keep as-is, attributes to rename, attributes to delete, and any new ones worth adding.
Align on Attributes
At this point everyone is aware of how they will benefit from standardizing metadata, they have reviewed the inventory of existing attributes and offered suggestions and feedback.
It is time to assemble a proposal for the first version of internal attribute standards! Draft a design document outlining the attributes with the name, description, expected data type, and example values along with the work needed to implement.
The approved version of that proposal and any amendments will serve as your data dictionary, as mentioned in the above section.
Establish Governance
These attributes are not etched in stone. As your company and system change, the telemetry and metadata will need to change too. It is therefore important to establish governance procedures upfront. This means defining processes to add, amend, or remove attributes as well as regularly reviewing the standards to ensure metadata consistency and relevancy.
Conclusion
Observability is all about being able to ask and get answers about your system by leveraging the strengths of each type of telemetry and the power of descriptive metadata.
Thanks to open standards that are proposed by OpenTelemetry and adopted across the industry, conventions for attributes describing common components and use cases are covered. Extending these standards starts with familiarizing yourself with the role of semantic conventions, and with guidelines for naming custom attributes. And from there, sharing the benefits of standardization with your team, department, and beyond.
After adopting and implementing these standards, everyone will benefit from rich observability data that is connected across telemetry types, consistent across team/system boundaries. This makes it easier to effectively and efficiently correlate and analyze what exactly is going on in your complex system.