Empowering developers
Developers have always needed insight into how their code behaves in production. In the past, gaining such visibility was a simple matter of turning to an APM overview dashboard and analyzing the available data.
Cloud native systems, however, complicate this simplistic approach.
In the past, a single application might run on a single server and generate a single log file. In the cloud native world, an application consists of multiple instances of microservices. Each of which, in turn, generates voluminous observability telemetry – not just log files, but metrics and traces as well.
Suddenly, the massive quantity of observability data becomes a logistical nightmare, overwhelming both operators and developers, running up the cloud bill.
Compounding these challenges is the fact that in the cloud native world, multiple development teams work in parallel, building interdependent microservices that are subject to frequent changes. With every deployment, new build IDs are generated, significantly increasing the telemetry cardinality, while the volume of telemetry data scales according. This further exacerbates the challenges of tracking and analyzing application behavior across a sprawling microservices environment.
Empowering developers to instrument, control, and leverage the right level of observability data they need to do their job – no more and no less – is a critical capability for any cloud native development effort that seeks to deliver quality software without unnecessarily running up the cloud bill.
Understanding developer observability
Providing the logs, metrics, and traces that make up modern observability telemetry is an important part of any cloud native deployment. Such data gives developers the information they need to ensure that the software in production has the performance and security characteristics that the business requires.
Developers need observability throughout the software development lifecycle to be focused on the application level and on pulling in associated infrastructure or dependency information as needed. IDE integrations for developer observability tooling should connect local testing of instrumentation changes to a platform that ties infrastructure metrics to application telemetry. When developers are able to view their application’s metrics, logs and traces as they write code, issues and misconfigurations can be caught and fixed early and easily.
In addition, developers may require different data at different times, depending upon the situation at hand. One moment they may need insight into database operations, and the next, visibility into Kafka stream behaviors, for example.
Observability requirements may vary from moment to moment. Any observability tool designed for developers, therefore, must give them as much fine-grained control as possible.
Giving developers situational awareness
Developers certainly need visibility into the production behavior of the code they’re writing — but in a cloud native environment, such code never runs in isolation.
Applications typically consist of multiple microservices working in concert: dynamic sets of interdependent microservices that multiple development teams may be coding simultaneously.
Such applications also interact with a broad landscape of infrastructure components, from databases to serverless functions to queues and other integrations, all running on Kubernetes.
For developers to ensure that the bits they are coding work properly within this dynamic whole, they must continuously integrate their code into the broader production landscape.
Modern coding practices require that developers have ongoing situational awareness of their code and how it works with everything else. After all, that’s why we call it CI – continuous integration – rather than, say, continuous development.
Developer observability tooling must provide this situational awareness. It’s not sufficient for developers to have insight into the performance of their code; they must also be aware of all the interdependencies with other microservices as well as the underlying cloud native infrastructure.
The ‘Goldilocks’ of developer observability
Developers require observability data at their fingertips to provide insight into how their code is running within the broader production landscape.
Too much telemetry, however, not only runs up the cloud bills – it also threatens to swamp the developer with extraneous information. Too much information can obscure those data points that are the most valuable.
Too little observability data is equally problematic. A natural reaction to too much information is to tune some of it out. Constraining information in this way, however, risks the possibility that developers might miss important information they need to make the right judgment calls about the code they are writing.
The challenge, therefore, is to increase the signal to noise ratio. Leverage tools like Chronosphere running on Google Cloud to reduce the quantity of unhelpful information while emphasizing the most useful data. The partnership between Chronosphere and Google Cloud put developers control of their data by enabling them to optimize the data based on specific needs and context
The goal, then, is to achieve the ‘goldilocks’ point where the amount of observability data is just right.
This point, however, is not fixed. It can change from moment to moment with the ongoing variability of the applications in production.
To ensure that developers continue to get the right amount of observability data, they must have control over that data on a moment-by-moment basis. Unlike the more comprehensive operational observability requirements, developers need to have their hands on the wheel, iteratively tweaking the information they’re receiving to reach the ideal goldilocks point of just the information they need at the time – no more and no less.
The Intellyx take
Developers want to adjust the observability data they receive to do their jobs better. The right information at the right time will certainly lead to better quality software with less toil.
Developers – and their bosses – should also be thinking about optimizing costs. Too much telemetry can run up the cloud bill, and without the proper governance, such cost overruns can be dramatic.
Too little observability data, however, can also run up costs, because errors are more likely to crop up in production, leading to downtime, increased rework, and customer churn – negative consequences that adversely impact the bottom line.
Given this financial context, any developer observability tool an organization is likely to purchase will end up paying for itself by avoiding both these extremes – a result even Goldilocks would appreciate.
Copyright © Intellyx BV. Chronosphere is an Intellyx customer. Intellyx retains final editorial control of this article. No AI was used to write this article.