Metadata enrichment with the Fluent Bit Kubernetes filter

(with troubleshooting tips)
Image showing a graphic of interconnected white circuit lines with three white cubes in the center, set against a grayscale curved background, representing the streamlined data flow filtering system in Fluent Bit Kubernetes environments.
ACF Image Blog

Learn how to enrich Kubelet logs with metadata from the K8s API server using Fluent Bit and how to avoid common misconfiguration problems.

Pat Stephens
Pat Stephens | Member of Technical Staff | Chronosphere

I have over 20 years experience in software engineering, primarily in the defence domain at Thales. The last few years of this were focused on transformation first to containers then Kubernetes. After Thales I worked on the cloud native team at Couchbase, working on their Golang-based Kubernetes operator. This included producing a custom Fluent Bit deployment for observability needs. Calyptia (founded by the Fluent maintainers) then approached me to come work with them building out their observability products. I joined Chronosphere with their acquisition of Calyptia.

10 MINS READ

As we have written previously, having access to Kubernetes metadata can enhance traceability and significantly reduce mean time to remediate (MTTR). However, the metadata you need may not be included in the logs. The Fluent Bit Kubernetes filter plugin makes it easy to enrich your logs with the metadata you need to troubleshoot issues.

When run in Kubernetes (K8s) as a daemonset, Fluent Bit can ingest Kubelet logs and enrich them with additional metadata from the Kubernetes API server. This includes any annotations or labels on the pod and information about the namespace, pod, and the container the log is from. It is very simple to do, and, in fact, it is also the default setup when deploying Fluent Bit via the helm chart.

The documentation goes into the full details of what metadata are available and how to configure the Fluent Bit Kubernetes filter plugin to gather them. In this post, we’ll give an overview of how the filter works and provide common troubleshooting tips, particularly with issues caused by misconfiguration.

How to get Kubernetes metadata?

Let us take a step back and look at what information is required to query the K8s API server for metadata about a particular pod. We need two things:

  1. Namespace
  2. Pod name

Cunningly, the Kubelet logs on the node have to provide this information in their filename by design. This information enables Fluent Bit to query the K8s API server when all it has is the log file. Therefore, given a pod log file(name), we should be able to query the K8s API server for the rest of the metadata describing the pod.

Using Fluent Bit to enrich the logs

First off, we need the actual logs from the Kubelet. This is typically done by using a daemonset to ensure a Fluent Bit pod runs on every node and then mounts the Kubelet logs from the node into the pod.

Now that we have the log files themselves, we should be able to extract enough information to query the K8s API server. We do this with a default setup using the tail plugin to read the log files and inject the filename into the tag:

[INPUT]
    Name tail
    Tag kube.*
    Path /var/log/containers/*.log
    multiline.parser  docker, cri

Wildcards in the tag are handled in a special way for the tag filter. This configuration injects the full path and filename for the log file into the tag after the kube. prefix.

Once the kubernetes filter receives these records, it parses the tag to extract the information required. To do so, it needs the kube_tag_prefix value to strip off any redundant tag or path to leave just the log filename with the three things required to query the K8s API server. Using the defaults would look like this:

[FILTER]
    Name             kubernetes
    Match            kube.*
    Kube_Tag_Prefix  kube.var.log.containers.

Fluent Bit inserts the extra metadata from the K8s API server under the top-level kubernetes key.

Using an example, we can see how this flows through the system.

Assume this is our log file:

/var/log/container/apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

The resulting tag would be (slashes are replaced with dots):

kube.var.log.containers.apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

We then strip off the kube_tag_prefix:

apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

Now we can extract the relevant fields with a regex:

(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<container_id>[a-z0-9]{64})\.log$
  • pod_name = apache-logs-annotated
  • namespace = default
  • container_name = apache
  • container_id = aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6
A Fluent Bit diagram illustrating the process flow, including Kubelet Log, Tag Generation, Prefix Stripping, Namespace & Pod Extraction, querying the Kubernetes API Server with a Kubernetes filter for adding metadata enrichment to logs.

The Fluent Bit Kubernetes filter extracts information from the log filename in order to query the K8s API server to retrieve metadata that is then added to the log file.

Troubleshooting misconfiguration woes

While the Fluent Bit Kubernetes filter takes care of the hard part of extracting the K8s metadata from the API server and adding them to the logs, when users experience difficulty, it is usually the result of misconfiguration. There are a few common errors that we frequently see in community channels:

  1. Mismatched configuration of tags and prefix
  2. Invalid RBAC/unauthorised
  3. Dangling symlinks for pod logs
  4. Caching affecting dynamic labels
  5. Incorrect parsers

Let’s discuss how to identify these issues and correct them.

Mismatched tag and tag prefix

The most common problems occur when the default tag is changed for the tail input plugin or when a different path is used for the logs. When this happens, the kube_tag_prefix must also be changed to ensure it strips everything off except the filename.

The kubernetes filter will otherwise end up with a garbage filename that it either complains about immediately, or it injects invalid data into the request to the K8s API server. In either case, the filter will not enrich the log record as it has no additional data to add.

Typically, you will see a warning message in the log if the tag is obviously wrong, or with log_level debug, you can see the requests to the K8s API server with invalid pod name or namespace plus the response indicating there is no such pod.

$ kubectl logs fluent-bit-cs6sg
…
[2023/11/30 10:08:14] [debug] [filter:kubernetes:kubernetes.0] Send out request to API Server for pods information
[2023/11/30 10:08:14] [debug] [http_client] not using http_proxy for header
[2023/11/30 10:08:14] [debug] [http_client] server kubernetes.default.svc:443 will close connection #60
[2023/11/30 10:08:14] [debug] [filter:kubernetes:kubernetes.0] Request (ns=default, pod=s.fluent-bit-cs6sg) http_do=0, HTTP Status: 404
[2023/11/30 10:08:14] [debug] [filter:kubernetes:kubernetes.0] HTTP response
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"s.fluent-bit-cs6sg\" not found","reason":"NotFound","details":{"name":"s.fluent-bit-cs6sg","kind":"pods"},"code":404}

This example was created using a configuration file like below for the official helm chart. As you can see we have added two characters to the default tag prefix (my) and you can see above in the details for the error that the name of the pod has two extra characters in the prefix: it should be fluent-bit-cs6sg but is s.fluent-bit-cs6sg, no such pod exists so it reports a failure. Without log_level debug you just get no metadata.

config:
  service: |
    [SERVICE]
        Daemon Off
        Log_Level debug

  inputs: |
    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Tag mykube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On

  filters: |
    [FILTER]
        Name kubernetes
        Match *

Unexpected tags

Using wildcards in the tail input plugin can trip you up sometimes: the * wildcard is replaced by the full path of the file but with any special characters (e.g. /) replaced with dots (.).

Beware of modifying the default kube.* tag in this case, and — as I try to stress as much as possible — use stdout to see the actual tags you are getting if you have any issues. As an example, consider the following tail configuration:

[INPUT]
    Name tail
    Path /var/log/containers/*.log

Now, you will get tags that look like this depending on what you configure:

  1. Tag kube.* ⇒ kube.var.log.containers.<filename>
  2. Tag kube_* ⇒ kube_.var.log.containers.<filename>

In the second case, notice that we have an underscore followed by a dot. Whereas, in the first case, there is no double dot as it is automatically collapsed by the input plugin. This can mean your filters do not match later on and can cause confusing problems. The first step is always the trusty stdout output, though, to verify.

Invalid RBAC

The Fluent Bit pod must have the relevant roles added to its service account that allow it to query the K8s API for the information it needs. Unfortunately, this error is typically just reported as a connectivity warning to the K8s API server, so it can be easily missed.

To troubleshoot this issue, use log_level debug to see the response from the K8s API server. The message will basically say “missing permissions to do X” or something similar and then it is obvious what is wrong.

[2022/12/08 15:53:38] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/12/08 15:53:38] [debug] [filter:kubernetes:kubernetes.0] Send out request to API Server for pods information
[2022/12/08 15:53:38] [debug] [http_client] not using http_proxy for header
[2022/12/08 15:53:38] [debug] [http_client] server kubernetes.default.svc:443 will close connection #23
[2022/12/08 15:53:38] [debug] [filter:kubernetes:kubernetes.0] Request (ns=default, pod=calyptia-cluster-logging-316c-dcr7d) http_do=0, HTTP Status: 403
[2022/12/08 15:53:38] [debug] [filter:kubernetes:kubernetes.0] HTTP response
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"calyptia-cluster-logging-316c-dcr7d\" is forbidden: User \"system:serviceaccount:default:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"name":"calyptia-cluster-logging-316c-dcr7d","kind":"pods"},"code":403}

[2022/12/08 15:53:38] [ warn] [filter:kubernetes:kubernetes.0] could not get meta for POD calyptia-cluster-logging-316c-dcr7d

In the example above you can see without log_level debug all you will get is the warning message:

[2022/12/08 15:53:38] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/12/08 15:53:38] [ warn] [filter:kubernetes:kubernetes.0] could not get meta for POD calyptia-cluster-logging-316c-dcr7d

Kubernetes has evolved over the years, and new container runtimes have also come along. As a result, the filename requirements for Kubelet logs may be handled using a symlink from a correctly named pod log file to the actual container log file created by the container runtime. When mounting the pod logs into your container, ensure they are not dangling links and that their destination is also correctly mounted.

Caching

Fluent Bit caches the response from the K8s API server to prevent rate limiting or overloading the server. As a result, if annotations or labels are applied or removed dynamically, then those changes will not be seen until the next time the cache is refreshed. A simple test is just to roll/delete the pod so a fresh one is deployed and check if it picks up the changes.

Log file parsing

Another common misconfiguration is using custom container runtime parsers in the tail input. This problem is generally a legacy issue as previously, there were no inbuilt CRI or docker multiline parsers. The current recommendation is always to configure the tail input using the provided parsers as per the documentation:

[INPUT]
    name              tail
    path              /var/log/containers/*.log
    multiline.parser  docker, cri

Do not use your own CRI or docker parsers, as they must cope with merging partial lines (identified with a P instead of an F).

The parsers for the tail plugin are not applied sequentially but are mutually exclusive, with the first one matching being applied. The goal is to handle multiline logs created by the Kubelet itself. Later, you can have another filter to handle multiline parsing of the application logs themselves after they have been reconstructed here.

What’s Next?

To learn more about Fluent Bit, we recommend joining the Fluent Community Slack channel where you will find thousands of other Fluent Bit users. Engage with experts, ask questions, and share best practices. Many of the troubleshooting tips in this blog were originally surfaced in the Slack channel.

We also invite you to download a free copy of Fluent Bit with Kubernetes by Phil Wilkins. This practical guide to monitoring cloud native and traditional environments with Fluent Bit covers the basics of collecting app logs, filtering, routing, enriching, and transforming logs, metrics, and traces.

Advertisement for a free expanded copy of “Fluent Bit with Kubernetes” by Phil Wilkins, featuring logos of Chronosphere and Calyptia, and a "MEAP" stamp on the cover. Learn how to convert logs to metrics seamlessly. Download now!

Share This: