Fluent Bit processors vs filters: when to use one or the other?

Processors, Stream Processors, and Filters???

Fluent Bit version 2.1.2 introduced the concept of Processors (not to be confused with Stream Processors), which, like Filters, enrich or transform telemetry data. With the release of Fluent Bit V3, we introduced three key Processors, each tailored to specific data manipulation needs:

Content Modifier: manipulates metadata and content of logs and traces, similar to the modify filter.
Metric Selector: selectively includes or excludes metrics, similar to the grep filter used for logs.
SQL Processor: offers a straightforward interface for selecting log content via conditional expressions, resembling the functionality of a regex filter.

Fluent Bit users are confused about how Processors differ from Filters and whether they are the same thing as Stream Processors.

This blog aims to clarify the differences and use cases for Processors, Stream Processors, and Filters, addressing common misconceptions within the Fluent Bit community.

Are Filters and Processors the same?

While Filters and Processors in Fluent Bit offer overlapping functionalities, there are notable differences in how they operate and their performance implications:

Data Routing: Unlike Filters, Processors cannot control data routing. They are directly attached to an input/output plugin and executed sequentially.
Performance and Threading: Processors can operate on a separate thread if the plugin is in threaded mode, which means any Processors attached to it will run in the worker thread, including multi-threaded outputs. This setup enhances performance by reducing contention in the main event loop and minimizes encoding/decoding steps when chaining operations. This is beneficial in scenarios involving heavy regular expressions or complex data transformations, as the Processor can handle these tasks in the worker thread, thus improving efficiency and throughput.
Handling of Metrics and Trace data: Filters in Fluent Bit operate primarily on logs and do not process metrics and traces. In contrast, Processors like the Content Modifier can work with both logs and traces, while the Metric Selector is designed specifically for metric data.

These distinctions help in optimizing data processing pipelines in Fluent Bit. Let’s delve deeper into these differences.

Routability of Processors in Fluent Bit

In Fluent Bit, data routing is a crucial process that involves guiding data through Filters to one or multiple designated outputs. This routing process leverages two primary concepts: Tags and Match rules, which are central to managing data flow.

Key concepts in routing:

Tags: Each piece of data generated by input plugins is assigned a Tag—a human-readable identifier that indicates the data’s source.
Match Rules: Match rules are defined within the output configurations to direct data to its appropriate destination. These rules use the Tags to determine the routing path for each data stream.

Filters play a dual role in the Fluent Bit pipeline. Not only do they enrich or transform the data, but they can also modify Tags. The ability to alter Tags (using Filters like the Rewrite Tag) changes the routing behavior dynamically within the pipeline, directing data to different outputs based on the new Tags.

However, unlike Filters, Processors cannot modify Tags. Their primary function is to transform the data without influencing its routing path. This distinction is crucial for understanding the limitations and applications of Processors within the Fluent Bit architecture.

How are processors performant compared to Filters?

Fluent Bit’s architecture has evolved significantly to improve scalability and efficiency. Initially, Fluent Bit operated on a single-thread model, which was adequate for moderate loads but limited at higher scales. With the introduction of multithread support in version 1.7.0, performance improved fivefold, leveraging multiple CPU cores to enhance data processing and delivery.

Multithreading Advantages:

Scalability: Running in multiple threads allows Fluent Bit to handle significantly more data by distributing tasks across multiple CPUs.
Configuration Flexibility: Users can configure each output connector with multiple worker threads. For instance, configuring four workers for an HTTP output plugin means each worker handles data delivery independently, with a round-robin approach to balance the load.

Example configuration for multithreading:

[OUTPUT]
    name    http
    host    192.168.3.4
    port    443
    tls     on
    format  json_lines
    workers 4

This lock-free, multithreaded implementation minimizes runtime contention and maximizes throughput. A simple configuration change, like setting workers to 1, dedicates a thread to data delivery, showcasing the performance difference.

Threaded input plugins:

With version 2.0.2, input plugins can run on separate threads, reducing the load on the main event loop and enhancing overall system responsiveness.

[INPUT]
    name     tail
    path     /var/log/containers/*.log
    tag      kube.*
    threaded on

Processing efficiency:

Processors introduced in recent updates represent a shift in handling data within Fluent Bit. Unlike Flters which operate sequentially within the main event loop — potentially creating bottlenecks — Processors can execute concurrently in different threads. This design reduces the overhead on the main thread, allowing Fluent Bit to manage higher data volumes more efficiently.

Contention reduction:

Historically, multiple Filters within a single thread could slow down data processing, as shown in the following architecture diagram.

Processors alleviate this by handling operations in separate threads, thus avoiding the bottlenecks associated with sequential Filter execution.

In summary, Processors maintain the functionality of Filters while also offeing improved performance through better thread utilization and reduced contention.

When to use a Filter or a Processor

In Fluent Bit, the decision to use Filters or Processors depends on the specific data handling requirements and the need for performance optimization. Consider using Processors for high-volume data streams or when concurrent processing across multiple CPUs can significantly reduce bottlenecks. Processors excel in scenarios where enhanced performance and reduced data latency are critical. However, Filters are still essential when dynamic routing capabilities or complex data manipulation (such as throttling input data, redacting sensitive data) are needed.

In conclusion, while Processors offer a significant new approach to data processing that can improve performance and efficiency, Filters remain valuable for scenarios requiring complex data routing and data manipulation.

Now that we have a better understanding of the differences between Processors and Filters, let’s look at Stream Processors and Processors.

Stream Processors and Processors

Stream Processors in Fluent Bit leverage a streaming SQL engine to perform complex data analysis and transformations in real-time. They excel in scenarios requiring continuous data stream querying, such as complex event processing (CEP), real-time analytics, and time-based data analysis. This capability is useful in monitoring applications, where timely data processing can lead to immediate actionable insights.

On the other hand, the newer Processors are designed for efficient data manipulation without the analytical capabilities of Stream Processors. Processors are best suited for simpler, high-performance data transformations directly within the pipeline. They operate without the additional overhead of a streaming engine, making them ideal for lightweight data enrichment and modification tasks that do not require the sophisticated querying capabilities of Stream Processors.

When to use a Stream Processor or a Processor

Choose Stream Processors when the need for in-depth, SQL-based stream analysis and conditional processing is paramount. Opt for Processors when simple, quick data transformations are needed to prepare data for further stages in your pipeline.

Wrapping up

Filters, Processors, and Stream Processors, significantly enhance Fluent Bit’s data manipulation capabilities. Each component serves a unique purpose: Filters for dynamic data routing and straightforward transformations, Processors for high-performance data modifications, and Stream Processors for complex analytical tasks. Understanding their distinct functionalities helps tailor data processing strategies to meet specific performance, analysis, and routing needs.

Keep learning

To learn more about Fluent Bit Processors, check out “Explaining the Fluent Bit processor,” an excerpt from the book Fluent Bit with Kubernetes by Phil Wilkins. You can also download a complimentary copy of the book courtesy of Chronosphere.

Download the free book

For more resources, visit Fluent Bit Academy, your destination for best practices and how-to’s on advanced processing, routing, and all things Fluent Bit. Here’s a sample of what you can find there:

Getting started with Fluent Bit and OpenSearch
Getting started with Fluent Bit and OpenTelemetry
Advanced routing with Fluent Bit v3

Visit Fluent Bit Academy

About Fluent Bit and Chronosphere

With Chronosphere’s acquisition of Calyptia in 2024, Chronosphere became the primary corporate sponsor of Fluent Bit. Eduardo Silva — the original creator of Fluent Bit and co-founder of Calyptia — leads a team of Chronosphere engineers dedicated full-time to the project, ensuring its continuous development and improvement.

Fluent Bit is a graduated project of the Cloud Native Computing Foundation (CNCF) under the umbrella of Fluentd, alongside other foundational technologies such as Kubernetes and Prometheus. Chronosphere is also a silver-level sponsor of the CNCF.

Recent News

Featured Resources