Building an observability pipeline with OpenTelemetry and Fluent Bit

Blog

Learn how to use open standards and open source tools to collect telemetry data from any source and route it to any destination.

On: Apr 5, 2023

13 MINS READ

The problem

An e-commerce company planned a flash sale, and the sudden spike in traffic led to website performance issues. The engineering team couldn’t pinpoint the actual root cause and realized that they needed a better way to monitor their applications. After some research, the team decided to implement an observability pipeline.

An observability pipeline, also called a telemetry pipeline, is a system that collects, processes, and analyzes data from various sources, including logs, metrics, and traces, to provide insights into a distributed system’s performance and behavior.

With an observability pipeline in place, the company could monitor its applications in real time, detect anomalies, and troubleshoot issues faster. Further, by integrating the pipeline with its incident management system, the company would improve its systems’ reliability and availability.

Issues to consider when setting up an observability pipeline

While it looked good on paper, the team had multiple questions.

How easy is it to integrate the observability pipeline with the existing systems? An observability pipeline involves integrating multiple tools and systems which is quite complex. The company would need to invest time and resources to ensure that their pipeline is configured correctly.
Is the pipeline scalable? Can it handle the surge in data? Scaling was a bigger issue with implementing the pipeline. Since the applications will generate lots of data, the pipeline needed to be scalable. Was this solution scalable?
How quickly can we troubleshoot? An observability pipeline working along with other applications would introduce complexity. It is important for the company to understand whether the solution would provide better actionable insights or just add more noise and confusion.

These are some common questions that come up while implementing an observability pipeline.

Multiple observability tools and stacks are available. Like everything else, there’s no one-size-fits-all solution. In this post, we’ll discuss implementing an agile observability pipeline using OpenTelemetry and Fluent Bit.

OpenTelemetry

OpenTelemetry is an open-source observability framework that provides a standardized way to collect and transmit telemetry data, such as traces, logs, and metrics, from applications and infrastructure. This makes it easier to monitor and troubleshoot your systems and gain insights into their behavior.

The OpenTelemetry project was formed by merging the OpenCensus and OpenTracing projects. It provides a common set of APIs, libraries, and tools for collecting and analyzing telemetry data in distributed systems.

Components

OpenTelemetry has multiple components and the major ones are

Instrumentation Libraries: These are language-specific libraries that allow you to instrument your application to collect telemetry data. These are available for Java, Python, and .Net to name a few. OpenTelemetry supports both manual instrumentation where you need to modify your source code and auto instrumentation where instrumentation code is automatically injected.
Collector: The collector receives telemetry data from your application and sends it to the backend for storage & analysis. It can receive data in multiple formats over multiple protocols.
Exporter: The exporter sends telemetry data from the collector to the backend of your choice. OpenTelemetry provides a variety of exporters for backends like Prometheus, and Elasticsearch or you can even write your own exporter.
SDKs: OpenTelemetry also provides SDKs that you can use to build custom instrumentation and exporters for your application.

For more information about OpenTelemetry components, check out the docs.

Read “What is OpenTelemetry?”

Fluent Bit

Components of our applications and infrastructure generate different types of logs, including application logs and system logs. Since multiple data sources generate tons of data, any observability pipeline solution must be capable of dealing with different data sources and formats, provide flexible routing, and be reliable and secure. That’s where Fluent Bit comes in.

Fluent Bit is an open-source, lightweight, and vendor-neutral telemetry pipeline agent for logs, metrics, and traces. It is generally used to collect, process, and route data to backends like Elasticsearch, Kafka, or other systems. It can work with logs, metrics, traces, and any other form of input data. Being lightweight, one can run it on edge devices, embedded devices, and on cloud services as well.

Fluent Bit has multiple components including

Input: Input plugins help collect data from various sources. These support multiple sources like TCP, syslog, HTTP, etc.
Parser: The parser converts the unstructured incoming data to a structured format.
Filter: Filter allows us to alter the parsed data by matching, excluding, or adding metadata to your data.
Buffer: Buffer provides a unified and persistent mechanism for storing data. It also has mechanisms that can act as backups.
Router: The router routes the data through filters to a destination. One can configure the destination using tags and matches.
Output: Output plugins help define the destination for the processed data. More than 40 output plugins are currently available.

Read more about the components of Fluent Bit.

How does Fluent Bit enable observability pipelines?

Fluent Bit can help enable observability pipelines by providing a simple and efficient way to collect and forward telemetry data. Fluent Bit can be configured to collect data from various sources, including logs, metrics, and traces, and can forward data to various backend systems, allowing developers and operators to gain a comprehensive insight into the behavior and performance of complex systems.

Key features of Fluent Bit

Some of the key features of Fluent Bit that make it a popular tool for enabling an Observability Pipeline are:

Lightweight and Fast – Fluent Bit is designed to be lightweight and fast, making it ideal for processing large amounts of data. The tool is optimized for low CPU and memory usage, allowing it to run efficiently even on low-resource systems.
All-in-One – Fluent Bit supports logs, metrics, and traces, providing a single solution for collecting. processing, and forwarding the so-called three pillars of observability data.
Flexibility – Fluent Bit is highly configurable, providing developers and operators with the flexibility to customize the tool to meet their specific requirements. The tool supports a wide range of inputs and outputs, including file systems, HTTP, TCP, and more.
Scalability – Fluent Bit is designed to be scalable, making it ideal for processing large amounts of data. The tool can be deployed in a distributed architecture, enabling horizontal scaling and load balancing.

Building an OpenTelemetry observability pipeline using Fluent Bit

Let us see how we can build an observability pipeline using OpenTelemetry and Fluent Bit for monitoring and analyzing a microservices-based application. Here, the application is instrumented to send metrics, traces, and logs to Fluent Bit, which will be forwarded to Otel Collector and finally to Jaeger and Prometheus.

Prerequisites

For this demo, you will need to have Docker and Docker Compose installed. If you don’t have it already installed, you can follow the install docker-compose official documentation, which has very well-articulated steps. To follow these configuration steps, you can clone the already configured repository available here.

Demo repo

Configuring Fluent Bit

Fluent Bit uses a configuration file to specify its inputs, filters, and outputs. In this case, we will use the tail input plugin to collect logs from a file and the OpenTelemetry output plugin to forward the logs to the OpenTelemetry collector.

For Traces and Metrics

Similarly, we will use the HTTP method where our open telemetry plugin will be listening for metrics and traces at /v1/metrics and /v1/traces, respectively.

Here’s an example configuration file for Fluent Bit:

[SERVICE]
    flush 1
    log_level info
[INPUT]
    Name tail
    Path /var/log.log
    Tag demo-app

[FILTER]
    Name record_modifier
    Match demo-app
    Record hostname ${HOSTNAME}

[INPUT]
    name opentelemetry
    host 0.0.0.0
    port 3000
    successful_response_code 200
    
[OUTPUT]
    Name stdout
    Match *
    
[OUTPUT]
    Name                opentelemetry
    Match               *
    Host                collector
    Port                3030
    metrics_uri           /v1/metrics
    logs_uri            /v1/logs
    traces_uri          /v1/traces
    Log_response_payload True
    tls                 off
    tls.verify          off
    # add user-defined labels
    add_label           app fluent-bit
    add_label           color blue

This configuration file specifies that Fluent Bit should read logs from the path defined, tag them as demo-app and forward them to OpenTelemetry Collector at http://collector:3030/v1/logs.

The process is similar for traces and metrics; instead of forwarding them to the logs endpoint, it forwards to /v1/traces and /v1/metrics as defined in the output plugin in configuration.

Setting up OpenTelemetry Collector

OpenTelemetry Collector is a vendor-agnostic agent that can receive, process, and export telemetry data from a variety of sources. Here’s how to set up OpenTelemetry Collector:

Sending logs, metrics, and traces to OpenTelemetry collector

Once you have configured Fluent Bit, you can use it to forward the telemetry data to an OpenTelemetry collector. The OpenTelemetry collector is responsible for collecting and processing telemetry data from different sources and forwarding it to a backend system such as Jaeger or Prometheus. Here is our otel-collector configuration yaml; you can read more about receivers, exporters, and processors here

receivers:
  otlp:
    protocols:
    grpc:
    http:
        endpoint: "0.0.0.0:3030"

Seeing data in Jaeger and Prometheus

Once you have configured to forward telemetry data to an OpenTelemetry collector, we can export these data and use visualization tools such as Jaeger and Prometheus to view and analyze the data.

Jaeger is a distributed tracing system that can be used to visualize and analyze the performance of microservices-based distributed systems. To visualize logs in Jaeger, you will need to configure the OpenTelemetry collector to forward trace data to Jaeger. You can then use the Jaeger UI to view and analyze the traces.

exporters:
  otlp:
    # disable tls
    endpoint: "jaeger:4317"
    tls:
    insecure: true
  logging:
  prometheus:
    endpoint: "0.0.0.0:8889"

Prometheus is a time-series database and monitoring system that can be used to visualize and analyze metrics data. To visualize logs in Prometheus, we will need to configure the OpenTelemetry collector to forward metrics data to Prometheus. Lastly, we need to configure Prometheus to scrape data from the exported endpoint.

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: demo
    scrape_interval: 5s
    static_configs:
    - targets: ['collector:8889']

The final OTel collector yaml will look like:

receivers:
  otlp:
    protocols:
    grpc:
    http:
        endpoint: "0.0.0.0:3030"
exporters:
  otlp:
    # disable tls
    endpoint: "jaeger:4317"
    tls:
    insecure: true
  logging:
  prometheus:
    endpoint: "0.0.0.0:8889"
service:
  pipelines:
    logs:
    receivers: [otlp]
    exporters: [logging]
    traces:
    receivers: [otlp]
    exporters: [logging, otlp]
    metrics:
    receivers: [otlp]
    exporters: [logging, prometheus]

Manning Book: Fluent Bit with Kubernetes

Learn how to optimize observability systems for Kubernetes. Download Fluent Bit with Kubernetes now!

Download the book

Start your observability pipeline

To start the local instances of services, run the following command in the cloned repository.

$ docker-compose up --build

Example output:

(venv) ➜  fluent-bit-otel git:(master) ✗ docker compose up --build                                               
[+] Building 2.1s (12/12) FINISHED                                                                                                                                           
 =&gt; [internal] load build definition from dockerfile                                                                                                                     0.0s
 =&gt; =&gt; transferring dockerfile: 279B                                                                                                                                  0.0s
 =&gt; [internal] load .dockerignore                                                                                                                                        0.0s
 =&gt; =&gt; transferring context: 2B                                                                                                                                       0.0s
 =&gt; [internal] load metadata for docker.io/library/python:3.8-slim-buster                                                                                                2.0s
 =&gt; [auth] library/python:pull token for registry-1.docker.io                                                                                                            0.0s
 =&gt; [1/6] FROM docker.io/library/python:[email protected]:f2199258d29ec06b8bcd3ddcf93615cdc8210d18a942a56b1a488136074123f3                                       0.0s
 =&gt; [internal] load build context                                                                                                                                        0.0s
 =&gt; =&gt; transferring context: 64B                                                                                                                                      0.0s
 =&gt; CACHED [2/6] WORKDIR /app                                                                                                                                            0.0s
 =&gt; CACHED [3/6] COPY requirements.txt .                                                                                                                                 0.0s
 =&gt; CACHED [4/6] RUN pip install -r requirements.txt                                                                                                                     0.0s
 =&gt; CACHED [5/6] RUN apt-get update &amp;&amp; apt-get install -y curl                                                                                                           0.0s
 =&gt; CACHED [6/6] COPY app.py .                                                                                                                                           0.0s
 =&gt; exporting to image                                                                                                                                                   0.0s
 =&gt; =&gt; exporting layers                                                                                                                                               0.0s
 =&gt; =&gt; writing image sha256:613c0c77e65b1e512ba269a3745072e85e4f55bae7054d34e25d7f549a9b0bf7                                                                          0.0s
 =&gt; =&gt; naming to docker.io/library/fluent-bit-otel-app                                                                                                                0.0s

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
[+] Running 6/5
 ⠿ Network fluent-bit-otel_default      Created                                                                                                                     0.1s
 ⠿ Container fluent-bit-otel-prometheus-1  Created                                                                                                                      0.0s
 ⠿ Container fluent-bit-otel-jaeger-1   Created                                                                                                                     0.1s
 ⠿ Container fluent-bit-otel-collector-1   Created                                                                                                                      0.0s
 ⠿ Container fluent-bit-otel-fluentbit-1   Created                                                                                                                      0.0s
 ⠿ Container fluent-bit-otel-app-1      Created                                                                                                                     0.0s

Generate Traces

To generate traces run the following command in the terminal

curl -X GET http://localhost:5000/generate

Verify the pipeline

Metrics on Prometheus

Traces on Jeager

Logs on console

In the above screenshot, we can see the logs are generated when we hit the generate traces endpoint, and Fluent Bit collects them and forwards them to the OTel Collector.

Use cases of OpenTelemetry and Fluent Bit integration

OpenTelemetry and Fluent Bit integration can be used in a wide range of use cases, including centralized logging, application performance monitoring, and distributed tracing. Let’s dive deeper into each of these use cases.

Centralized logging

Centralized logging involves collecting, aggregating, and analyzing logs from multiple sources in a central location. This allows you to monitor the behavior and performance of your entire system in one place, making it easier to detect and diagnose issues.

By integrating Fluent Bit with OpenTelemetry, you can collect and process logs from multiple sources, and forward them to a centralized logging system such as Elasticsearch or Splunk. This enables you to quickly search and analyze logs from multiple sources, and identify patterns and trends that can help you optimize performance and troubleshoot issues.

Application performance monitoring

Application performance monitoring (APM) involves collecting and analyzing telemetry data such as logs, metrics, and traces to monitor the performance of your applications. APM tools can provide insights into the behavior and performance of your applications, allowing you to identify performance bottlenecks, optimize resource utilization, and troubleshoot issues.

By integrating Fluent Bit with OpenTelemetry, you can collect and process telemetry data from multiple sources, and forward it to an APM tool such as Datadog or New Relic. This enables you to monitor the performance of your applications in real-time and identify and diagnose issues quickly.

Distributed tracing

Distributed tracing involves tracking the flow of requests through a distributed system, and collecting telemetry data such as traces and spans to monitor the performance and behavior of the system. Distributed tracing can help you identify performance bottlenecks, optimize resource utilization, and troubleshoot issues in distributed systems.

By integrating Fluent Bit with OpenTelemetry, you can collect and process telemetry data such as traces and spans and forward it to a distributed tracing system such as Jaeger or Zipkin. This enables you to track the flow of requests through your distributed system and analyze the performance and behavior of your system in real-time.

Conclusion

In conclusion, the OpenTelemetry observability pipeline using Fluent Bit offers a highly efficient and flexible solution for observability needs. One of its key advantages is multiple plugins available for all possible data sources and destinations without having to install new agents and exporters into a single pipeline, this simplifies the observability architecture.

Moreover, the pipeline is highly customizable, enabling users to easily add new sources and destinations as needed, without requiring a major overhaul of the entire system. This feature can be especially valuable for organizations that are constantly evolving and need to adapt their observability tools accordingly. Additionally, the all-in-one support for traces, logs, and metrics provided by

Fluent Bit simplifies the management of data streams, reducing the complexity of the observability pipeline. Overall, the OpenTelemetry observability pipeline using Fluent Bit is a versatile and efficient solution that can help organizations gain valuable insights into their applications, without being locked into a single vendor or facing difficulties in scaling or complexity.