Red Hat OpenShift observability
Red Hat OpenShift is an industry leader in providing a consistent hybrid cloud foundation for building and scaling containerized applications with Kubernetes. It provides a solid foundation of open source standards and components to accelerate your application development and delivery of cloud native workloads.
In the latest version of OpenShift Container Platform, released earlier this year, we saw how Red Hat continues to innovate and expand the power of hybrid cloud, container workload standardization, and support for open source standards. To a keen observer of the evolution of OpenShift Container Platform over the years, it should be apparent that the challenge of cloud native observability at scale is left to the user to solve.
The basis of any Red Hat product is always founded in open source projects and standards, and OpenShift Container Platform is no exception — you see this when digging into the observability options Red Hat provides. However, while the basic premise has always been to provide the open source components and standard protocols, the task of integration at scale is left to the user.
In this blog I walk through how to solve the challenge of achieving cloud native observability at scale by:
- Exploring the observability options available to a Red Hat OpenShift user for the most common telemetry data
- Looking at what it offers in the way of observability UI experience
- And learning how integrating with Chronosphere’s observability platform is the path to controlling your telemetry data at scale
Metrics
The use of metrics as a telemetry data on Red Hat OpenShift for the core platform is tied into the Cluster Management Operator. This gives access to Prometheus metrics collection and its Alert Manager component and generates basic metrics coverage for the UI dashboards and alerts. Metrics telemetry data is stored using Thanos as the backend.
As a developer using Red Hat OpenShift, you are provided with a simple dashboard to view a few metrics, alerts, and events, which are focused on your project workloads. These are simple dashboards and do not provide visibility for any programmatic instrumentation that developers might be implementing in order to monitor their actual application code.
Distributed tracing
There are currently three options for installing a distributed tracing architecture within Red Hat OpenShift: a distributed tracing platform using Tempo, a Red Hat build of OpenTelemetry, and a deprecated Jaeger-based tracing platform.
By using the open source OpenTelemetry Protocol (OTLP) and providing an OpenTelemetry Collector, all third-party solutions with OTLP support can accept exported tracing from Red Hat OpenShift.
Also noteworthy, distributed tracing in Red Hat OpenShift includes the usage of the CNCF Sandbox project Perses for dashboard visualization of traces across distributed workloads.
Logging
The logging setup is a very narrow and opinionated solution. It comes with documentation that starts immediately with, “Only the configuration options described in this documentation are supported for logging.”
It offers Loki log aggregation with the intention that the user forwards all logging onwards. Their logging is explicitly stated as not a highly scalable solution nor Security Information and Event Monitoring (SIEM) compliant. They do not support historical, secure, or long term log retention.
Users are encouraged to find external solutions for their logging needs. According to the documentation: “…Red Hat OpenShift configuration of Loki has short-term storage, and is optimized for very recent queries. For long-term storage or queries over a long time period, users should look to log stores external to their cluster.”
Events
The only events handled by Red Hat OpenShift are the ones provided by standard Kubernetes events. This leaves the user to search for help when tracking events across their cloud native solutions.
Whitepaper: Breaking Vendor Lock-In with Prometheus
Get the tools you need to adopt open source and break free of your observability vendor
Chronosphere: Taking control of observability at scale
In order to augment and expand upon the core observability telemetry and visualization offered within Red Hat OpenShift, you can gain unparalleled visibility and control with Chronosphere Observability Platform. Purpose-built for containerized infrastructure and applications, Chronosphere empowers you to optimize performance, simplify operations, and scale confidently—without compromising on efficiency or cost.
Chronosphere supports all major Kubernetes distributions, including, but not limited to, Red Hat OpenShift, self-managed Kubernetes, Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), Google Kubernetes Service (GKE), and Rancher Kubernetes Engine (RKE).
The challenges
There is an unprecedented volume of telemetry data in Kubernetes environments at scale. The Chronosphere Control Plane provides a suite of tools to identify and eliminate low value telemetry to reduce noise and prevent data overload.
While Kubernetes is a constantly changing environment for observability solutions, Chronosphere Observability Platform uses Chronosphere Lens to automatically detect and monitor ephemeral workloads. Integrated change events provide context on how workloads change over time. Also, by working with open standards, Chronosphere Observability Platform captures and processes telemetry data in real time to prevent blind spots, even as workloads fluctuate.
Dynamic relationships between services present complexity when trying to track and segment microservice workloads. Utilizing open source tracing standard OTLP, Chronosphere Observability Platform is able to prioritize sampling based on the importance of each to the business and centrally manage dynamic head and tail sampling — this ensures only high value traces are captured. Tooling in the platform — such as Differential Diagnosis (DDx) — provides a queryless, intuitive interface for investigating service relationships and identifying the root causes of performance issues. Finally, service dependency visualization highlights the relationships between microservices to simplify troubleshooting and reduce the time to resolution.
The last challenge we all are facing is cost optimization within our observability solutions. Chronosphere Observability Platform leverages its Control Plane for reducing telemetry costs by enabling organizations to easily identify and eliminate redundant or irrelevant metrics and refine data collection strategies. Dynamic sampling for tracing reduces storage and processing costs for trace data by capturing only the most valuable spans at both the head and tail of requests. The goal is to provide transparent cost management with detailed visibility into telemetry resource consumption, empowering teams to align observability costs with business goals.
If you are looking for cloud native scale observability for your Red Hat OpenShift environments, then check out Chronosphere Observability Platform, which was purpose-built to address these challenges. Chronosphere simplifies Kubernetes observability at scale while enabling performance optimization and cost efficiency.
Supporting open standards
Chronosphere Observability Platform can integrate with Red Hat OpenShift through their commitment to support the various open source standards and projects from the Cloud Native Computing Foundation (CNCF).
Metrics ingestion, querying, alerting, and visualization is 100% compatible with open source standards using Prometheus, OpenTelemetry, and StatsD protocols. For insights into how these open source projects work, explore the free online observability workshop collection. Distributed tracing ingestion is 100% compatible with OpenTelemetry, Jaeger, or Zipkin.
Finally, you are able to ingest trace, metric, and log data via the native OpenTelemetry Collector or the Chronosphere Collector.
Observability at scale—including AI workloads
Chronosphere Observability Platform empowers AI companies to control observability costs and complexity in high-volume, unpredictable environments. Chronosphere supports all telemetry types (metrics, events, logs, traces) from various sources at a scale necessary for AI workloads—with the ability to process over 2B data points per second. It can scale seamlessly with AI workload demands and handle massive data volumes from training workloads and unpredictable spikes from inference operations.
You can resolve issues faster to maintain AI service quality and empower developers of all experience levels to quickly identify the source of service issues without deep system knowledge or complex query writing. Chronosphere Observability Platform surfaces potential problem areas through a simple point-and-click investigation process, eliminating reliance on system experts.
Control observability data volumes and costs — with Chronosphere you are able to identify and keep the data your team actually uses. Eliminating waste and prevents you from paying for data that provides little value to your team.
Whitepaper: Getting Started with Fluent Bit and OSS Telemetry Pipelines
Getting Started with Fluent Bit and OSS Telemetry Pipelines: Learn how to navigate the complexities of telemetry pipelines with Fluent Bit.