Mastering log data transformation at scale with Chronosphere

Close-up of a blue abstract digital design with a circuit chip symbol in the center, creating a futuristic and technological effect. The background features dark blue and light blue tones, reminiscent of Chronosphere's advanced scaling capabilities.
ACF Image Blog

Logging for observability is becoming complex. More often, observability and site reliability engineers must transform their logs to meet observability needs. Learn how Chronosphere Telemetry Pipeline can make those transformations turnkey with built-in processing rules and advanced automation.

 

Chronosphere logo
Staff Chronosphere
33 MINS READ

Editor’s note: As of June 1, 2024 Calyptia Core is now Chronosphere Telemetry Pipeline from the creators of Fluent Bit and Calyptia. This webinar was broadcasted before this change. 

Using automation for log transformation

When it comes to observability data, logs contain an incredible amount of information that developers and site reliability engineers can use. The challenge is that the amount of log data produced is difficult to manage at scale.

Anurag Gupta, Field Architect at Chronosphere recently presented How to Transform Logs at Scale, where he walked through how Chronosphere’s Telemetry Pipeline can make those log data transformations turnkey with built-in processing rules and advanced automation.

If you don’t have time to watch the full webinar, be sure to check out the transcript below to learn how Chronosphere Telemetry Pipeline provides you the ability to:

  1. Leverage over 20+ built-in processing rules to simplify transformations.
  2. Use log sampling, deduplication, and aggregation to reduce noise and save costs.
  3. Achieve security requirements with redaction on basic and complex message structures.
  4. Bring complex business logic with Lua scripting and visualized testing.
  5. Real-world examples of processing Amazon Web Services (Cloud), Windows (Operating System), and Okta System (SaaS) logs.

Why log data transformation?

Anurag Gupta: Everyone, really appreciate you joining. I’m really excited to talk today about [log] processing. We’ve done one of these webinars in the past both at the Fluent Bit layer for open source. We’ve done a little bit for Calyptia [Telemetry Pipeline], and we’ve released quite a few features, so I’m really excited to show you that. Today, we’ll focus on logs. We have an upcoming webinar that’ll focus a little bit on traces and metrics, but I’m happy to give a little bit of a preview on what that looks like. Let me go ahead and share my screen. So welcome again to everyone. We’re gonna talk about transforming logs to meet your observability and security needs. 

Transformation of logs is a pretty hot topic. I think many enterprises are going through [and asking] where can they become more efficient? Where can they reduce noise? Where can they provide more context and more awareness to developers as they’re debugging or troubleshooting issues so that they can do those things faster? 

Today, we want to talk a lot about how we do that from a security side, and how we do that from an overall processing side. We’ll [also] talk a little bit about Calyptia [Telemetry Pipeline]. For folks who aren’t familiar with Calyptia [Telemetry Pipeline], what it is and how it deploys. What are some of the use cases? We’ll talk about that. We’ll talk about processing. Why is this even useful? We’ll get into demos; a lot of today’s gonna be live. I hope I don’t run into demo demons, but you never know. 

I love to do things as live as possible. I want to show you the real stuff. That’s what we’re gonna walk through today. Feel free to ask questions throughout. I love to answer as many questions in context as I can. Otherwise, we’ll go ahead and get started. 

A little bit about myself and company Calyptia, which has now been acquired by Chronosphere. I’m one of the maintainers at Fluent Bit. And, I’m a pretty active community member. So you’ll see me throughout the Slack channel. You’ll see me in some of the community meetups I’ve contributed to, to some of the books and, and other materials that are out there. So I very much love to discuss anything and everything that exists around telemetry, pipelines and telemetry agents, OpenTelemetry, Fluent Bit, Fluentd, and get into all of those details there.

An overview of Fluent Bit and telemetry pipelines

Anurag: So let’s talk a little bit about what this is, where you’re joining this webinar, you’ve potentially seen some of our stuff around Fluent Bit. But what is Calyptia [Telemetry Pipeline] and how does it fit in? Calyptia [Telemetry Pipeline] is built on the success of the Fluent projects, and Fluent is part of the Cloud Native Computing Foundation and sits next to Kubernetes and Prometheus. Now, Envoy is a graduated project and Fluent Bit is one of the big projects and gets a massive amount of usage. So about 13 billion in just the last two years alone, starting at a sub billion a couple years ago and now just, just skyrocketing and demand. 

Most notably, if you’re leveraging a public cloud like AWS, Google, Microsoft, Oracle, IBM, many of them are leveraging Fluent Bit already. So you might have it in your Kubernetes clusters. You might have it within some specific services within some of the network stack or within other places. So there’s a lot of this that already exists out there within Fluent Bit. And with Calyptia [Telemetry Pipeline], we really saw this desire of, “Hey, how do we help users who are going in this journey of trying to bridge and have vendor neutrality and, and, and communicate with many, many different backends and, and give them a bit of ease of use to talk through?”  

What that looks like is when we talk to a lot of users who are using a lot of proprietary agents, right? Maybe some, some beats, maybe some Splunk, maybe some Datadog. You might have multiple of these. And as you are doing multiple of these agents to, to different backends, it can get very complex; [I] see a lot of Kafka sprinkled out there.

You can get locked in and really, as you need to do different use cases, maybe some processing, maybe redaction, maybe some enrichment, maybe some removal. Having this type of lock-in or having some of these agents is not too beneficial for us, because it is hard to unravel that hard to really get the outcomes that we need with potentially these locked-in types of places. This is where Calyptia [Telemetry Pipeline] comes in. We’re not just something that is: “Hey, drop in Fluent Bit, and there you go.” We think of it as a full telemetry pipeline type solution. 

So [it’s] something that helps all the way at the edge, whether you’re doing the collection [or] whether you’re doing the parsing. How do you go and deploy that on old legacy systems? How do you deploy that on your cloud native environments? How do you deploy it on serverless environments? And then how do you do really heavy processing? We’ll talk about what [and] how does that work; what you should look at when trying to do some of this at scale; and bringing that together in a really nice, easy-to-use package. So that’s what Calyptia [Telemetry Pipeline] is at a very high level. I’m eager to show folks here what this all looks like. So, again, we’re gonna spend a lot of time in the demos here. 

Diving into logging processing use cases

Anurag: Next up, let’s talk about where Calyptia [Telemetry Pipeline] comes in. When we talk about processing, when we talk about routing, how can this actually work in my environment? So I’ll give a really concrete example. We had some users doing a lot of stuff with logstash and some SIM requirements.

I’ve got some stuff going on with Rsyslog, and how do I go and make sure that I am doing collection and doing it in a way that is appropriate for the organization, but also scalable, right? There’s tons of times where I might not be doing any filtering. I have five different places I’m looking at for how am I collecting this data? And if someone, for example, leaves the organization, we might have something that’s very bespoke or very custom that we’re not gonna be able to support, maintain, and provide a good service for, for the remainder of folks. So this is again where Calyptia [Telemetry Pipeline]  can come in as a telemetry pipeline, give you that central view of how I do collection, how I do routing [and] where it is going, and do some of the filtering, extraction, and redaction?

And [there’s] the high load of thousands, millions of events per second. So with that, let’s talk about what the architecture of Calyptia [Telemetry Pipeline] looks like and why that’s important to something like processing. 

Now, Calyptia [Telemetry Pipeline] is separated into two distinct components. You have a management plane, and you have a data plane. With the management plane, we’re really focused on giving you that experience of “Hey, I have my SAML or my SSO, how do I do things like meta observability? How do I go and orchestrate this across all of my environments?” Then the data plane actually is in your environment. So whether that’s in your cloud account, whether that’s on your virtual machine, whether it’s in your Kubernetes cluster, [we’re] really providing you a way to deploy that as close to where the data is being generated as possible.

That’s where we’re gonna talk a lot about processing. The other scope of that is when we talk about processing, it’s happening as close to the data as possible, right? Think about the advantages of doing processing closer to where the data is being generated. If I’m in a cloud environment and I remove a bunch of that noise, that actually reduces my egress charge, so my bandwidth limits are not as high or not as heavy. 

Second, it also reduces latency. So if I’m perhaps collecting a bunch of data and I’m routing it to my security backend, the detections, the rules, the alerts that then go and run on top of that can, can be operated faster. Just the latency of not having to fire this data out of my environment, back into my environment, buffer it, and then fire it over. It’s just really sticking to that single place.

And the last bit is security. When we think of all of the data that flows around, if something let’s say for example, username or user ID accidentally fires up into another service. The awesome and the downside of having things like Amazon S3 is you’ve gotten five nines of durability, probably larger, right? If you have that, great, it’s backed up in, in so many different places, and we don’t want that to be backed up because we need to get rid of it. 

In a similar fashion, if you can do the processing, the removal, and the redaction of that before it gets sent to the particular backend, then we can ensure that we’re not giving away the secrets or giving away some of the personal data that that could put us into some, some penalties with specific sovereignties or something like that.

The different types of log data processing

Anurag: So let’s talk a little bit about what type of processing is useful. Why do we even need this? What can [log data] processing give us? I like to separate it into five, super easy-to-understand buckets. First is schematization. There’s a ton of unstructured stuff out there. There’s text files. We’ll create a few in the demos here. If we can parse this stuff, if we can give you a little bit of structure on top, it makes it so much easier at when you’re searching, when you’re looking for something, when you’re debugging, diagnosing, correlating, or having a hostname that matches across all your logs. So schema and formatting. We’ll talk a little bit about what that looks like. 

We talked about security, removing sensitive information, masking, and redaction. If we don’t want that to exist in our backend environment, how do we get rid of it as things stream through removal? Excluding noisy logs, we’ll talk a little bit about some strategies there. It’s not just yes or no, it’s not a binary type of thing. There’s some really fun things you could do with reducing the noisy noisiness in your logs. You can get rid of fields, you can get rid of particular characters. Punctuation can account for 2-3% of a log file. We’ll talk a little bit about what that looks like in context. For enrichment, if I am firing a log, this might seem counterintuitive, but if I had a hostname, maybe it makes the way that I search that log much faster. Maybe it makes the correlations that much better. GeoIP, AWS metadata, Kubernetes are great examples of good enrichment. 

And the last one, which I think is actually pretty unique to Fluent Bit and Calyptia [Telemetry Pipeline], is processing for routing. When we think of processing, we are very fundamentally thinking of, “I have record A coming in. I do some transformations on it, and then I fire it off to place B.” Well, the reality is, you can do processing and say, “Well, actually, this should be sent to [place B] like this, it should be sent to [place C] like that, it should be sent to [place D] like that.” 

Especially in our modern world of multiple backends, maybe ClickHouse, maybe OpenSearch, maybe Elastic, and Splunk. I might wanna do different things per backend. I might wanna keep full fidelity for something like ClickHouse with schematization, and maybe for Splunk, I just leave it as is and do some schema on read when I do a search query. So you have that flexibility [to say] how do we do the routing? What do we wanna keep? Where do we want to keep it? And we’ll talk about how you can do some of that. 

Now, the last bit, which I think is interesting before I jump into the demos, is how can we do processing at scale? I’ll talk about this at a high level of what I think [are] just good practices. And two, how do we do this within Calyptia [Telemetry Pipeline] itself? 

So the first is just good practices for processing if we can keep things in a semi-stateless world. What I mean by that is we’re not necessarily like a database, where if you have transactions that are going in between two different places, that they have to be completely in sync. If we treat almost every stream as something that can be split into multiple parts and then take each of those multiple parts and process them independently, it can give us speed, it can give us scale. 

I don’t wanna take away from all our demo time. I think that’s why most folks have joined. So let’s go, let’s jump straight into it, and I’ll talk a little bit about this as we demo. So I’m gonna switch my screen here into Calyptia [Telemetry Pipeline] UI. So this is the Calyptia [Telemetry Pipeline] console. We have the ability to manage our edge pieces, our pipelines. And this is a pipeline that I’ve deployed on top of a virtual machine. And within this pipeline, we can create something pretty simple. We’ll do something like in syslog standard out, we can choose the service type. 

I don’t know if I got cut off in the recording, but it talked about how all of our pipelines are automatically load balanced. And that’s good because then we can do processing in multiple different places instead of [it all being] singularized. I’ll go ahead and create something very quick here. For sources, we support push and pull base sources. I’ll choose the parser. 

This [parser] is adding some schematization to all the incoming data. Click Save, and then we’ll add another destination. So from a destination side, I’m just gonna choose standard out. But to talk a little bit through what these places are, we support multiple observability backends, data backends, SIM backends, and of course some of the big cloud provider services and open source formats. So let me jump into standard out and we’ll do something like save and deploy.

This pipeline is going and creating. What you might have noticed, I’ll click back into the pipeline into the editor if there were two places for processing. So processing is here [and] allows you to take in any arbitrary log or, or text-based piece. And within this interface, [you can] quickly test it out, try it out, see how things can work and, and identify what type of rule I actually wanna run. Now the nice thing about this is it runs completely in your browser. 

Basic log processing

So let’s start with the most basic type of processing. I’m gonna expand the window here. And the first one we’ll start with is enrichment. Within this, we have a bunch of built-in actions, things like add aggregate, which we’ll walk through, block, copy, pass, parse, and split. I’m just gonna do something really simple where I’m gonna add a key. So we’ll say add a region key and maybe here we’ll say us as a region. So I could do that. Of course you can add comments so other people who are looking at your processing can understand what’s going on. And of these 10 records I’m, you know, I’ve now increased the size of the, the total amount of my records by 82%. So it’s pretty, pretty hefty if I do this for every single message coming through. With this being the payload, and I’m still sending 10 events.

Allow or disallow specific log fields

Now another piece of processing can be to allow or disallow specific fields. So let’s say, for example, we don’t want the region field to be there, kind of funny because we just added it, but we can say, Hey, let’s go and block this specific key. We’re gonna, we’re gonna block the region key. We don’t want that actually to exist. And we can then go remove that. And we’re back to exactly where we were before. So within this processing interface, I can turn these things on and off, see what the effect might be if I turn on one versus the other, in this case. If I don’t even have that field to begin with and I try to block it, no problem. It will run without making any type of error. Now, let’s do some, some other fun things.

Maybe for example, we want to do a hash of that particular key of log. You can say let’s do a log hash. We can choose the specific type that we wanna do. I’ll just keep it at 256. And here, great. We have a log hash adding a big hash key on top of the entire record so we can see, okay, this is what that looks like. And obviously this is a pretty big hefty additional percentage on top of it. So we can, we can do things like that as well. Now, let’s go find some more fun logs. And to do that, I’m gonna bring in a set of sample logs. And these are just live samples you could find on the web. But I really wanna show, you know, how, how can we do things as live as possible?

Common logging formats

Anurag: So first we’re gonna talk through maybe some, some of the common types of log formats that we have. Now, one interesting thing, I’m gonna turn off all the processing rules and just keep things as is, if you’ll notice that on the raw side, I have kept everything just arbitrary without the log key. And second is when I do send it through, I do have this lock key. What happened? 

Well, the fun thing is when Calyptia [Telemetry Pipeline] receives data that isn’t in any type of schema, JSON schema or anything like that, we’ll auto append this lock key. It doesn’t mean that when we send it to the backend that it’s gonna have this key. You can send things raw, you can keep things exactly as they were uninterrupted. But it does help us when we start to do some of these processing transformations.

So say for example, we wanna do parson, we wanna do redaction, right? Instead of just saying, “okay, fire away at the entire message,” we can use this key to say, “only look at the log key when you do this particular transformation.”

Let’s say for example I just grabbed some nice Azure activity logs. I’m gonna switch to JSON here from raw data to JSON. We’ll copy this and I’ll just paste it and just refresh my screen real quick into here. Go back to JSON, delete that and paste. Perfect. We’ll do a little bit of a pretty print. 

And as we are doing the pretty print, now we can start to go and look at very specific pieces of that nested schema. As I was saying before, if you send it to Calyptia [Telemetry Pipeline] raw, it remains raw as is. If you send it in with some schema we get the full schema and we can start to do some operations on top of this.

Video Thumbnail

Reducing log data noise with processing

Anurag: To reduce noise, let’s say the backend that we have only expects five keys. And those five keys could potentially be event name, scope, [and] action. I’m just putting a pipe like the RegX very specifically, we’ll click Apply and we’ll run that. You can even see, hey, we’ve got a pretty nice reduction on that log by just saying, “Hey, we’re gonna allow only these keys.” So that can be very useful when you have a lot of duplication within a particular log. You just need to very quickly identify or, or say, “I only need these three fields. I don’t need 50 of them.” This, for example, is an Azure activity log. Okta logs are pretty similar in the JSON. We can grab an example of that.

This is a really nice way to go about doing some of that processing. Now I’m gonna just undo [those changes] that real quick and we’ll look at some of the ways to potentially save [noise] here. So let’s look at this ID field, for example. This is pretty, pretty noisy. And maybe the only thing we care about [are] the things after this Microsoft network. Now the path that’s there, it shows every single piece of the Azure resource and we only care about the subscription ID. And maybe we only care about what their actual resource is. But just for the sake of showcasing this, let me copy something from this scope, and I’ll use another action here called Search and Replace. We’ll choose that key. We’ll say scope, we’ll choose that RegX and the replacement here, I’m just gonna do a percentage sign, which means just replace it with nothing.

So if I run that and we come back into what we did scope, you can see the scope now has been reduced and I think we saved a bit of bytes there as well, about a percentage. So nice, quick, easy way to, to save us a little bit of space there. There are certainly ways to reduce this even further. Some other methods, like let’s say for example, these schema pieces, if I am not gonna use that in my log searches, if I’m not gonna use that in my day-to-day, let’s just block those keys. And maybe, we’ll, we’ll keep it sort of just as schemas.microsoft.com. Any field that contains that we’ll go and block that as well. Okay, maybe we didn’t block this schemas.org, we didn’t block some of these nested fields under the claims.

We can even do something like, “Hey, we’re gonna do this under the claims field underneath that.” Now we’re saving a really good amount; so we’re really cleaning up this log. I’m doing this pretty fast, pretty efficiently, [and]  not really spending the time doing a deployment, checking, deployment, check, deployment, check. And that can really help you as you need to save some time here as well. Now, these are some simple processing rules. We talked about taking some, some just basic enrichments. We talked about some of the reductions, but let’s talk about some of the greater patterns. And let’s do it over here. 

We’re doing it on one record, let’s do it over hundreds of records. So let me go grab a quick NGINX example now, and I’ll even prompt any audience participation. If you have a fun log that you want us to try to process, sure, we’re happy to do it. 

Video Thumbnail

How log data transformation and processing impacts telemetry pipelines

Anurag: [We’ve got a] really good question in the chat. Can a completed set of processing rules be downloaded as a configuration file for a manual client deployment? Yes. You can take all of these processing rules that can be exported and they can be imported into other pipelines.

They can be reused. You can even use templates here throughout the entire system. So let’s do another fun thing here. I’ll just get rid of the block records so we can do parsing on top of all of these. I think one thing that’s really important as well is we also wanna understand how processing impacts our pipeline. Yes, everything is persisted, so it can go and do some of the persistence, but how does it actually impact my pipeline from like a CPU side? So we enabled this thing called enable profiling, and it actually only runs on top [of everything]. If we go all the way to the bottom, you’ll see it will print out in milliseconds how long this took to process.

So we did [about] 400 calls. The total time was about 5.4 milliseconds to process all of these with the parser, and the average time was 0.01 milliseconds. So about .11 microseconds if you will. So this is great when we wanna learn like, “Hey, do we have an accidental RegX or parsing that’s doing something a bit crazy?” and we’re not sure if that’s actually gonna be efficient for us. So I’ll disable that for now. We’ll just have our regular piece here and I’m going to do some aggregations. I don’t really care about the entirety of these logs, but I don’t want to delete them. I just want to get a summary of [however many hundreds of logs]. So to do this I’ll add another action. 

Next, I’m going to delete that log key. It’s not really needed. Next, I’m going to take Parse and I’m gonna flatten this. So I want everything to be on the top record. So Add New Action, go to Flatten. Here’s Flatten Sub Record, and the key I’m gonna use is parsed. You can see I’m not adding comments to all of them. I should. That’s good maintainability for these things. And as you share them, as someone asks, “Hey, can we take these processing rules and give them to other folks?” Yes. And we should add comments there so they know what [and] why we did what we did. So [now] I’m flattening it. 

Next I’m gonna use this Aggregate Records function. Now, Aggregate Records runs on a time window. Many of these different processing rules could run on top of time windows, so when we look at deduplication, aggregation, sampling, we look at a time window of all those events. 

If I have a thousand events per second, I might wanna aggregate on a thousand events or on five seconds for 5,000 events. In this case, when we do this in the browser, we treat everything as a time window of just one second, even if it contains the time, [using one second makes] life a little easier. And then from select keys, we have to use JSON. I know this is not the most ideal way to do this, and SQL would be way better. That’s in process. But if I still wanna go and look at how this thing works, I wanna select the code key. So I wanna know how many, 200s, how many 404s, how many 500s. The next thing I’m going to do is save the count and we’re going to do the count of code. And we can start with that for now; that’s the function that we’re gonna run; and boom. 

So the way that we do it, we support count, max sum, min –  these kinds of basic levels of compute. And here, let me go select the code piece. I want to get a new field called count, and then I want to use the function called count on that as an aggregation. And I’m running that across all 450 events. That’s why my time window says 450 in this case. You can see like it does 450 and then there’s an extra. So I’m just gonna do 455 to handle all of them. So [we have] about 398 200s, 17 404s, nine 500s, and 26 301s. 

Routing to observability backends

Anurag: Now why is this interesting for us? Because what we could do from a processing side, let’s just click Apply real quick. You’ll see that this thing lights up with a processing rule that’s out there. You can test it. We could, for example, route this to another backend. Let’s say S3, call [it] archive, [and] choose GZIP compression. 

Now let’s pretend this is a costly backend. As we’re sending data to this costly backend, we may wanna take this processing rule and just send the costly backend our summaries every 455 seconds. And for Amazon S3, we wanna send everything in raw. Now, the place that I put this processing rule in today is on the input side, but we can also customize the processing on the output for the costly backend and the output for S3. 

This  is how we do routing within Calyptia [Telemetry Pipeline]. That’s very unique. Routing, typically, I think folks think of it as “have a stream of let’s just use zeros and ones, and I take that stream of zero and ones, and maybe I want the zeros to go here and I want the ones to go there.” The most basic routing, this is what we’ve implemented in Fluentd 13 years ago. Tag based routing, zeros go to this endpoint, ones go to this endpoint. Kind of a first in, first out

Now there starts to become really unique use cases where [it is] “I actually want the zeros to go here. I want ones to go here, but I want a combination of zeros and ones to also go here if they match certain criteria.” And that starts to become pretty complex. So if we want to do routing, that becomes more and more complex. The typical way that you go and tackle this — we did this with Fluentd back in the day — is you copy the stream, say, “Okay, I’m gonna take all the zeros and ones, and then I’m gonna copy the stream into another zeros and ones, and I’ll copy the stream into another zeros and ones. And then from that copy, I can then route it effectively. So I have 50% there, 30% there, 60% there.”

It doesn’t add up to a hundred because I’m actually, it’s like a Venn diagram of where these things get routed to. And I might wanna keep some here, and I wanna keep some there. The nice thing with Calyptia [Telemetry Pipeline] and the underlying Fluent Bit engine is we learned from Fluentd way back when and said, “Okay, we’re gonna keep a single layer copy of that data, and the output plugins effectively subscribe to that, that one copy of data.” So the costly backend, it subscribes to that copy of data. Amazon has three subscribes of that copy of data. You get one single stream and then you’re doing processing as you subscribe. So [we have] standard output, it subscribes to that data, it goes and runs, maybe it does the aggregation I was talking about. And it performs that right as it sends the data.

And then the Amazon S3 just keeps everything as is for archival purposes. And that’s very, very efficient. It’s great because if you think of it like, if I’m sending data to five backends, it can be really costly just from a performance and processing side if I have to copy five streams of data, say I have 3 TB coming in, five different destinations, that’s routing to, that’s 15 TB that I effectively have to copy for, for all the processing. So instead we keep the 3 TB the same. We use pointers and things that we had back in the day to, to use, subscribe to that data, and do some of the transformation. And that’s really what makes processing very, very powerful with this type of architecture with Calyptia [Telemetry Pipeline]. Not to even say this is then multiplexed between multiple replicas, multiple processes that are handling all that different data volume.

Applying aggregation rules

Anurag: So next let’s go back to our aggregation rule that was running here, right? We can then run calculations on top of that. So I was talking about, let’s say averages, and maybe I want to generate some metrics around this, and maybe the metric I want to generate is around size. Now, size right now it’s captured as a string [and] parsed as a string. So I’m gonna actually change this from a string into a number to do that. I can add another action. We’ll do parse number and call size. We’ll use the numeric base of 10. I’m gonna put it before the aggregation too, so then we can use it. Excellent. You can see the color changed and, you know, at least the quotes are gone too. So great.

We can now do some operations on this. I’m gonna go back to my aggregation side. In this, we’re gonna create a new field and we’re gonna call it average size. And this is gonna be made up of the operation of average and the average from the field of size. I think I got that right. Let’s see. And obviously didn’t do anything because I didn’t have it turned on. Turn it on and it’s perfect. Okay, so now we got the average size. For the 200s, it’s almost like 5,000; for the 404s this much; 500s this much; and 300s this much. So we now are separating and doing calculations on top of each of those. It’s really great for us if we wanna do something like a couple of these to rebuild how this thing looks. And then do some of these aggregations maybe send it up as a metric, right?So instead of having to send this up as a raw log.

We could maybe do some profiling on that. What does it actually look like here? Each of these represents the processing rule that’s running here. So parsing; how much deleting is costing us; how much flattening is costing us; how much parsing the number is costing us; and even how much the aggregation is costing us. So 3 milliseconds is actually not too bad. We  can reduce it and we can kind of subsidize the noise there as well.

Okay, so we talked about processing, we talked about a bit of enrichment, some redaction, some removal, some optimization with allowing specific keys, and blocking specific keys. How does this look for routing, where you can take different processing rules per specific endpoint? And what does this look like? 

The stuff that I also wanna show is how this map melds in the world of other schemas. We talked about some Azure stuff, but you know, maybe I have some OpenTelemetry raw logs that I want to do some processing [on] that’s very specific to perhaps one of these attributes. I want to remove an attribute from this type of schema. And this is pretty complex, right? There are JSON arrays which then have attributes and then this JSON array, which then has an array of attributes. And each attribute has an array of keys.

Video Thumbnail

And we’re back down to one event, but from a [certain] standpoint each of these will have our gauge, and then we have our histogram. But if you look, the counter is gone because we’ve deleted that counter attribute within this as well. 

So this is a very complex log type, right? It has arrays, it has JSON. You can see, right, that even with some of these more complex schema types, we can still do what you might need to do by leveraging the system that exists here. So we select the first event, the second event, delete that event and, and go from there. That’s a bit of processing with split block and join. And yeah, I think that covers most of the use cases that we have.

Another, another great place for like the split is say for example, you’re pulling an Okta system log, and I might want to split each of the individual events that come in as a giant array. So we can, we can split that as well. 

Let’s get back into the presentation just to wrap up here. Again, the nice thing here is you get automatic load balancing [and] can scale this up with replicas. I know I didn’t really show it that much, but within this, if I want five parallel processes of processing, I just do something like that. And all the processing will happen across all of those and of course keep all my code here as well. If I wanna use something like Paraform or GitHub to actually do this. Everything you see here is backed with the CLI as well. 

And for those who are really interested in the scale, you can get things like automatic persistence, auto retries, Kubernetes native with tolerations and taints for high availability rollout, disruption budgets, customizable servicing rests, all these things as part of how we think of pipelines and how processing really needs to be done for these high scale use cases.

Additional resources

Curious in learning more about the Chronosphere Telemetry Pipeline and telemetry pipelines in general? Check out the following resources: 

How to transform logs at scale to meet your observability needs

Share This: