Distributed tracing and observability
In software development, observability allows us to understand a system from the outside by asking questions about the system without knowing its inner workings. Furthermore, it allows us to troubleshoot quickly and helps answer the question, “Why is this happening?”
For us to ask (and answer) those questions, the application must be instrumented. That is, the application code must emit signals such as traces, metrics, and logs, which will contain the answers we seek. In this post, we will focus specifically on traces.
Distributed tracing involves tracking the flow of requests through a distributed system and collecting telemetry data such as traces and spans to monitor the system’s performance and behavior. Distributed tracing helps identify performance bottlenecks, optimize resource utilization, and troubleshoot issues in distributed systems.
Many platforms are used to monitor and analyze trace data and help engineers spot problems, including Chronosphere, Datadog, and the open-source Jaeger. Today, though, we will be using AWS X-Ray, a less commonly used platform but a convenient one for demonstration purposes since so many developers have AWS accounts.
To collect and route the traces to X-Ray we’ll be using Fluent Bit, a widely used open-source data collection agent, processor, and forwarder. Fluent Bit is most commonly used for logging, but it is also capable of handling traces and metrics, making it an ideal single-agent choice for any type of telemetry data.
In this post, we’ll guide you through the process of sending distributed traces to AWS X-Ray using Fluent Bit.
Prerequisites
- Docker and Docker Compose: Installed on your local machine.
- An AWS account
- AWS CLI is a tool to manage AWS services. Install the AWS CLI by following the officialAWS CLI installation guide. After installation, configure the AWS CLI with your credentials and default region by running AWS configure. For detailed instructions, refer to theAWS CLI configuration guide
- Familiarity with Fluent Bit concepts such as inputs, outputs, parsers, and filters. If you’re not familiar with these concepts, please refer to the official documentation.
Distributed tracing workflow
Instrumented applications emit trace data that is collected and processed by a centralized agent, which then sends the data to a backend for storage and analysis
Generating trace data
In a microservices architecture, applications are instrumented using specific libraries to send trace data in a particular format supported by the storage engine.
OpenTelemetry (OTel) has become the standard format for working with telemetry data. Its open-source observability framework provides a standardized way to collect and transmit telemetry data such as traces, logs, and metrics from applications.OTel provides a common set of APIs, libraries, and tools for collecting and analyzing telemetry data in distributed systems.
We will be using a Python (uses Flask framework) application that we’ve instrumented using OpenTelemetry SDKs to generate trace data in OpenTelemetry protocol (OTLP).
We will configure Fluent Bit to receive the emitted trace data using the OpenTelemetry input plugin.
Note: For simplicity and demonstration purposes, we will be using a single service capable of generating a hierarchical distributed trace. But in a practical scenario, there would be multiple services instrumented to generate trace data.
Storing trace data in AWS X-Ray
AWS X-Ray accepts trace requests in the form of segment documents, which can be sent using two primary protocols:
- AWS X-Ray API (HTTP): You can send segment documents directly to the AWS X-Ray API using the PutTraceSegments API. This is done using HTTP/1.1.
- Direct UDP: You can send segment documents directly to the AWS X-Ray daemon (runs aside with application) over UDP. The X-Ray daemon buffers segments in a queue and uploads them to X-Ray in batches.
Unfortunately, AWS X-Ray utilizes a non-standards-compliant trace ID. Since Fluent Bit does not support the custom X-Ray API format, it cannot send trace data directly to AWS X-Ray. To overcome this, we will be using theAWS Distro for OpenTelemetry (ADOT), which supports OTLP input and can be used with the Fluent BitOpenTelemetry output plugin. ADOT automatically converts the compliant trace ID to the format required by AWS X-Ray.
Our architecture looks like this:
Fluent Bit both receives and submits OTLP but the data must be converted to the bespoke format required by AWS X-Ray
Configuring Fluent Bit
Here’s the Fluent Bit configuration that enables the depicted above:
[SERVICE]
flush 1
log_level info
[INPUT]
name opentelemetry
host 0.0.0.0
port 3000
successful_response_code 200
[OUTPUT]
Name opentelemetry
Match *
Host aws-adot
Port 4318
traces_uri /v1/traces
tls off
tls.verify off
add_label app fluent-bit
Breaking down the configuration above, we define one input section:
INPUT section
name opentelemetry
: Specifies the input plugin to use, which in this case isopentelemetry
. This plugin is designed to receive telemetry data (metrics, logs, and traces) following the OpenTelemetry format.host 0.0.0.0
: This binds the input listener to all available IP addresses on the machine, making it accessible from other machines.port 3000
: Defines the port on which Fluent Bit will listen for incoming data.successful_response_code 200
: This is the HTTP response code that Fluent Bit will send back to the sender to indicate that the data was received successfully. A value of200
corresponds to HTTP OK, meaning the request has succeeded.
OUTPUT section
Name opentelemetry
: Specifies the output plugin to use. This indicates Fluent Bit will forward the processed data to another service or tool supporting OpenTelemetry data.Match : This pattern matches all incoming data. In Fluent Bit, the
is a wildcard that matches all tags.Match
directive is used to filter which data is sent to a particular output based on the tag associated with the data. The asteriskHost aws-adot
: Specifies the destination host to which the data will be forwarded.Port 4318
: Defines the port on the destination host where the OpenTelemetry collector or service is listening.traces_uri /v1/traces
: Sets the specific URI endpoint where trace data should be sent. This is part of the OpenTelemetry specification for sending trace data.tls off
: Indicates that TLS (Transport Layer Security) will not be used for this connection, meaning data will be sent in plaintext.add_label app fluent-bit
: This adds a label to the data being sent out. Labels are key-value pairs. Here,app
is the key, andfluent-bit
is the value.
With our INPUT and OUTPUT configuration explained, let’s implement it in practice.
Create Fluent Bit configuration file
Create a file called fluent-bit.conf
with the following contents:
[SERVICE]
flush 1
log_level info
[INPUT]
name opentelemetry
host 0.0.0.0
port 3000
successful_response_code 200
[OUTPUT]
Name opentelemetry
Match *
Host aws-adot
Port 4318
traces_uri /v1/traces
tls off
tls.verify off
add_label app fluent-bit
Create OTel configuration
Create a file called otel.yaml
with the following contents. Be sure to replace the key value <put-your-aws-region>
with your AWS region.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
awsxray:
region: <put-your-aws-region>
service:
pipelines:
traces:
receivers:
- otlp
exporters:
- awsxray
This configuration defines how the AWS Distro for OpenTelemetry (ADOT) Collector operates. It specifies the collection (receivers) of telemetry data via OpenTelemetry Protocol (OTLP) over gRPC and HTTP, and the export (exporters) of trace data to AWS X-Ray:
- Receivers: Configures the ADOT Collector to receive telemetry data.
- OTLP Receiver: Accepts data over two protocols:
- gRPC: Listens on
0.0.0.0:4317
for incoming gRPC connections. - HTTP: Listens on
0.0.0.0:4318
for incoming HTTP connections.
- gRPC: Listens on
- OTLP Receiver: Accepts data over two protocols:
- Exporters: Defines how and where processed data is sent.
- AWS X-Ray Exporter: Configured to send trace data to the AWS X-Ray service in the specified AWS region.
- Service:
- Pipelines: Organizes the flow of data from receivers to exporters.
- Traces Pipeline: Specific for trace data, it uses the
otlp
receiver to collect data and theawsxray
exporter to send the data to AWS X-Ray.
- Traces Pipeline: Specific for trace data, it uses the
- Pipelines: Organizes the flow of data from receivers to exporters.
This configuration sets up the ADOT Collector to collect telemetry data using OTLP over both gRPC and HTTP and to export trace data to AWS X-Ray for analysis and visualization.
Create Docker Compose configuration
Create a file called docker-compose.yml
with the following contents and replace these two values, <put-your-aws-access-keys-id>
and <put-your-aws-secret-access-key>,
with your AWS credentials.
version: '3.8'
services:
aws-adot:
image: public.ecr.aws/aws-observability/aws-otel-collector:latest
container_name: aws-adot
ports:
- "4317:4317" # Grpc port
- "4318:4318" # Http port
- "55679:55679"
volumes:
- "./otel.yaml:/otel.yaml"
environment:
- AWS_REGION=ap-south-1
- AWS_ACCESS_KEY_ID=<put-your-aws-access-keys-id>
- AWS_SECRET_ACCESS_KEY=<put-your-aws-secret-access-key>
command: ["--config", "/otel.yaml"]
restart: "no"
fluent-bit:
image: cr.fluentbit.io/fluent/fluent-bit:2.2
container_name: fluent-bit
ports:
- "3000:3000"
volumes:
- "./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf"
restart: "no"
trace-generator:
image: sharadregoti/trace-generator:v0.1.0
container_name: trace-generator
ports:
- "5000:5000"
environment:
- OTEL_HOST_ADDR=fluent-bit:3000
restart: "no"
This docker-compose.yml
file defines a multi-container setup with three services:aws-adot
, fluent-bit
, and trace-generator
.
- aws-adot:
- Uses the
public.ecr.aws/aws-observability/aws-otel-collector:latest
image. - Exposes ports
4317
(gRPC),4318
(HTTP). - Mounts a local
otel.yaml
configuration file into the container. - Configures AWS credentials and region through environment variables.
- Specifies a command to use the mounted config file
- Uses the
- fluent-bit:
- Based on the
cr.fluentbit.io/fluent/fluent-bit:2.2
image. - Exposes port
3000
for log processing. - Mounts a local
fluent-bit.conf
configuration file into the container.
- Based on the
- trace-generator:
- Uses the
sharadregoti/trace-generator:v0.1.0
image. - Exposes port
5000
. - Uses an environment variable to specify the
fluent-bit
service as the destination for trace data.
- Uses the
Start Docker containers
docker-compose up
Generate traces by hitting the sample app
Open a new terminal and execute the below curl request to generate a trace:
curl -X GET http://localhost:5000/generate-hierarchical
or
curl -X GET http://localhost:5000/generate
Go to the AWS console and Open AWS X-Ray
Clean up
Execute the following to shut everything down:
# Press ctrl + c in the terminal instance where containers are running in foreground
docker-compose down
Conclusion
In this post, we’ve walked through the essentials of setting up distributed tracing with AWS X-Ray and Fluent Bit, demonstrating how to seamlessly integrate trace data collection and forwarding in a microservices environment. By leveraging Docker, AWS X-Ray, and Fluent Bit, developers can achieve a robust observability framework that is both scalable and easy to implement.
Learn more
To learn more about Fluent Bit, visit the project website or visit Fluent Bit Academy where you will find hours of on-demand training videos covering best practices and how-to’s on advanced processing, routing, and all things Fluent Bit. Here’s a sample of what you can find there:
- Getting Started with Fluent Bit and OpenSearch
- Getting Started with Fluent Bit and OpenTelemetry
- Parsing 101 with Fluent Bit
- Advanced Routing with Fluent Bit v3
We also invite you to join the vibrant Fluent community. Visit the project’s GitHub repository to learn how to become a contributor. Or join the Fluent Slack where you will find thousands of fellow Fluent Bit and Fluentd users helping one another with issues and discussing the projects’ roadmaps.
About Fluent Bit and Chronosphere
With Chronosphere’s acquisition of Calyptia in 2024, Chronosphere became the primary corporate sponsor of Fluent Bit. Eduardo Silva — the original creator of Fluent Bit and co-founder of Calyptia — leads a team of Chronosphere engineers dedicated full-time to the project, ensuring its continuous development and improvement.
Fluent Bit is a graduated project of the Cloud Native Computing Foundation (CNCF) under the umbrella of Fluentd, alongside other foundational technologies such as Kubernetes and Prometheus. Chronosphere is also a silver-level sponsor of the CNCF.