The number of tools for observability is overwhelming. We sort through the noise and discuss the major open source projects used in observability efforts.
Erik is a Customer Journey Strategist at Chronosphere. His own personal journey has taken him from working in higher education to working in tech startups. Data relaxes him, as do cooking and books. He currently resides in North Carolina, where he recently returned after a long time away. He hopes to prove that fellow North Carolinian Thomas Wolfe was wrong about going home again.
On: Jan 2, 2024
Recently we wrote about why the evolution of observability is naturally migrating toward open source. In that post, we also mentioned that there has been an explosion of open source tools that solve various problems of the complex puzzle that is observability.
In this post, we’ll help you sort through some of the noise as we discuss the major open source observability tools.
We give particular emphasis to those projects that are part of the Cloud Native Computing Foundation (CNCF) or at least where the project’s sponsoring company is a CNCF member. The CNCF has established itself as the standard bearer of open source and is home to the Kubernetes project, which has transformed cloud native computing.
The CNCF categorizes its projects based on maturity level, with three progressive levels.
We limit this list to projects that have achieved incubating or graduated status, i.e. those known to be used successfully in production. Users should feel completely confident in any project that has achieved that milestone. Your risk tolerance should inform your adoption of incubating projects. Learn more about CNCF project maturity levels.
Prometheus is a monitoring and alerting system written in Go that collects metrics data and stores it in a time series database. It includes a powerful query language called PromQL (Prometheus Query Language) that lets users select and aggregate time series data in real time. Prometheus is commonly used in conjunction with Grafana (see below) for visualizing the data. Alerting is handled through Prometheus Alertmanager.
Prometheus was the second CNCF-hosted project after Kubernetes, so it has been battle-tested in production environments for years. It is a standalone open source project and is maintained independently of any company.
CNCF Status: Graduated
License: Apache 2.0
GitHub: https://github.com/prometheus/prometheus
Fluentd is a data collector written in a combination of Ruby and C that is used primarily for collecting logs from sources and sending them to desired destinations. It utilizes a plugin architecture for integrations with sources and destinations and currently has over 800 plugins available. No matter how obscure your endpoint, there is probably a Fluentd plugin for it. It also offers plugins for filtering or parsing the data before delivery.
Fluentd is another longtime project, originally released in 2011 and joining CNCF shortly after Prometheus in 2016.
CNCF Status: Graduated
License: Apache 2.0
GitHub: https://github.com/fluent/fluentd
Fluent Bit was originally intended to be an alternative to Fluentd. Written in C with a much smaller footprint and CPU utilization, it was created for use in containerized and embedded environments. However, in the last few years, its scope has expanded significantly. In addition to logs, it can also handle metrics and trace data, making it a single telemetry pipeline agent capable of handling all the traditional three pillars of observability. It also recently added eBPF capability through an integration with Aquasec’s Tracee tool, making it compatible with the next generation of observability data to be mined for insights.
Like Fluentd, FLuent Bit utilizes a plugin architecture for integrations with sources and destinations as well as parsing and filtering. It lacks the breadth of its older sibling’s integrations, but the current plugins cover the vast majority of production use cases. It also supports creating plugins in Go and WASM, making it much easier to develop custom plugins should the need arise. Fluent Bit’s ability to filter and parse data mid-stream exceeds Fluentd’s capabilities, and custom filters can also be written in Lua. As of Fluent Bit v2.0, it provides native support for the OpenTelemetry Protocol.
Note: Fluent Bit is sponsored and maintained by Chronosphere.
CNCF Status: Graduated (under the umbrella of Fluentd)
License: Apache 2.0
GitHub: https://github.com/fluent/fluent-bit
Jaeger is the final CNCF graduated project on our list. It allows developers to monitor and troubleshoot transactions in distributed systems by visualizing the chain of events in these microservice interactions. Jaeger connects data from different components to create a complete end-to-end trace.
Jaeger was created in 2015 by engineers at Uber to meet their needs for tracing on Uber microservices and was donated to CNCF in 2017. It achieved graduated status in 2019, the seventh project to do so. As of May 2022, it provides native support for the OpenTelemetry Protocol.
CNCF Status: Graduated
License: Apache 2.0
GitHub: https://github.com/jaegertracing/jaeger
OpenTelemetry has taken the open source world by storm. Formed from the merger of the OpenCensus and OpenTracing projects in 2019, it is now the second-highest velocity project in the CNCF ecosystem, closely trailing Kubernetes. Its popularity is understandable given its goal of unifying tracing, metrics, and logging telemetry standards. It is, essentially, bringing law and order to what has been a wild west.
More than just standards, though, OpenTelemetry has evolved to become a collection of tools, APIs, and SDKs. Although it is still considered a CNCF incubating project, it has become so widely embraced that even commercial applications that have traditionally benefited from closed architectures are now loudly proclaiming their integrations with OTel. At this juncture, it seems safe to say that regardless of what your observability stack entails—open source, commercial products, or a mixture—it should be compatible with OpenTelemetry.
Note: OpenTelemetry applied for CNCF graduated status in March 2024. It is widely expected that the application will be approved.
CNCF Status: Incubating
License: Apache 2.0
GitHub: https://github.com/open-telemetry
Chaos Mesh and Litmus are both CNCF projects at the incubating stage that provide chaos engineering platforms. A relatively new discipline, chaos engineering tries to break systems through controlled experiments using random and unpredictable behavior in order to collect information about the failure. Both Chaos Mesh and Litmus were admitted to the CNCF in 2020.
CNCF Status: Incubating
License: Apache 2.0
GitHub (Chaos Mesh): https://github.com/chaos-mesh/chaos-mesh
GitHub (Litmus): https://github.com/litmuschaos/litmus
Thanos and Cortex both seek to make Prometheus highly available and horizontally scalable with long-term storage. The two projects clearly share many components with Prometheus, but they take a fundamentally different approach to how these pieces are joined together. Both are CNCF projects at the incubation stage.
CNCF Status: Incubating
License: Apache 2.0
GitHub (Thanos): https://github.com/thanos-io/thanos
GitHub (Cortex): https://github.com/cortexproject/cortex
OpenSearch is a fork of the very popular Elasticsearch search and analytic suite. It was created in 2021 after Elastic (the parent company of Elasticsearch) changed the project’s license from Apache 2.0 to be dual licensed under the Elastic License (their own creation) and Server Side Public License (SSPL) in a move to make it difficult for cloud companies to sell managed versions of Elasticsearch. It was a move directly targeted at AWS and followed a similar move by MongoDB a few years earlier. AWS fired back that the change meant that Elasticsearch was no longer truly open source and announced the fork that eventually became OpenSearch.
Update: On August 29, 2024, Elastic announced they would be adding AGPL as a licensing option, making Elasticsearch truly open source again.
Like Elasticsearch, enterprises often already use OpenSearch to store and analyze their business, operational, and security data, so when they adopt an observability program many of the tools needed are already in place.
OpenSearch is not a CNCF project, although AWS is a platinum member of the CNCF.
CNCF Status: N/A
License: Apache 2.0
GitHub: https://github.com/opensearch-project/OpenSearch
Created in 2014, Grafana is a powerful data visualization platform. It is commonly paired with Prometheus and allows users to create dashboards from which to monitor system performance. It utilizes a plugin system to integrate with data sources such as Prometheus or dozens of other options, open source and commercial. It began as a solution specifically for visualizing metrics, but as with so many other projects that originally focused on one pillar of the observability trio, it now supports logs, metrics, and traces.
Grafana is not a CNCF project; however, its parent company Grafana Labs is also a platinum member of the CNCF, like AWS. Grafana is also the only project to make our list that is not available under the Apache 2.0 license, instead utilizing the more restrictive Affero General Public License (GPL) v3. The move came in 2021, shortly after Elastic announced its decision to abandon Apache 2.0. In a statement announcing the move Grafana Labs stated that while they wanted more protection than Apache 2.0 offered, they wanted to remain with an Open Source Initiative (OSI) approved license and felt that AGPLv3 offered a good compromise.
CNCF Status: N/A
License: AGPL 3.0
GitHub: https://github.com/grafana/grafana
Whether you are going with only open source solutions, commercial ones, or a hybrid approach, creating an observability program is a difficult task, and there is no one right solution for everyone. Hopefully, this guide has helped you to identify the major open source solutions available to you.
With Chronosphere’s acquisition of Calyptia in 2024, Chronosphere became the primary corporate sponsor of Fluent Bit. Eduardo Silva — the original creator of Fluent Bit and co-founder of Calyptia — leads a team of Chronosphere engineers dedicated full-time to the project, ensuring its continuous development and improvement.
Chronosphere Telemetry Pipeline streamlines log collection, aggregation, transformation, and routing from any source to any destination and also provides the ability to manage Fluent Bit agents as fleets. This allows companies who are dealing with high costs and complexity the ability to control their data and scale their growing business needs.
Chronosphere Observability Platform is the only observability platform built for control. Recognized as a leader by major analyst firms, Chronosphere empowers customers to focus on the data and insights that matter by reducing data complexity, optimizing costs, and remediating faster.
Talk to us today to learn how Chronosphere can help you control your observability efforts.