OpenTelemetry 101: Let’s instrument!

Graphic showing a white cursor arrow over a green circle, set against a blue background with circuit-like lines and glowing nodes, seamlessly illustrating the basics of OpenTelemetry and its instruments for beginners.
ACF Image Blog

In this workshop, our Principal Developer Advocate, Paige Cruz explores out-of-the-box manual and automatic instrumentation. Check out the like to the full workshop link at the end of this blog.

20 MINS READ

At last year’s KubeCon + CloudNativeCon 2023, Chronosphere’s Principal Developer Advocate Paige Cruz led a workshop on adopting OpenTelemetry by instrumenting a sample Python application with traces. 

From exploring out-of-the-box automatic instrumentation to manual instrumentation, this workshop guides engineers of all backgrounds to better understand how telemetry travels from your application to the Collector. 

If you don’t have time to sit down for the full video, check out a transcript of Paige’s workshop below.

Defining common observability terms

Paige: It’s lovely to see so much interest in OTel and tracing. It’s my favorite telemetry type and I’m excited to share it with you today … We have about five labs, but this is sort of a work-at-your-own-pace. The second thing that is important to know, other than the QR code, is there are a few pre-reqs that you may want to start downloading: Podman, Python3, and the sample application

We start each lab with a goal. So, [the first goal] is really just making sure we all have the same understanding of common terminology. Observability loves to throw in scary academic sounding terms, and I just want to demystify those and get them understood upfront. And let’s understand where telemetry fits in the landscape.

What is observability?

So, what is observability? If you ask 10 different people, you get 10 different answers. I think it’s how effectively you can understand your system behavior from the outside, using the data it generates. 

What is monitoring?

Monitoring, on the other hand, is the continuous process of watching and tracking system health based on a predefined set of data. I think of monitoring as the smoke alarm in your house. It’s checking for smoke particles. It will alert you when it senses those in the air. It’s kind of always watching.

What is telemetry?

Telemetry is the process of recording and sending data from remote components to a backend. When we talk about software telemetry or infrastructure telemetry, that is typically metrics, logs, events and traces. And, if you’ve been to some of the OTel (OpenTelemetry) talks today, we will soon be adding another type, which is “Profile.” So, telemetry is really just about sending this sort of data from one device to either a central backend or a proxy.

What is instrumentation?

Instrumentation is the code that records and measures the behavior of an app or infrastructure component. We can really break down instrumentation into three categories. 

Auto-instrumentation

There’s auto-instrumentation, which is mostly what is marketed and is really the first step most organizations take, especially with OpenTelemetry. [This is when you add observability without changing the source code itself.]

Programmatic instrumentation

There’s programmatic instrumentation, which is where you’re manually bringing in libraries, and setting up some configuration.

Manual instrumentation

Manual instrumentation is when you’re adding those custom attributes.

What is OpenTelemetry?

And, this primer brings us to OpenTelemetry. What is it? Why do we care? OpenTelemetry is a standardized set of vendor agnostic, SDKs, APIs and tools to adjust, transform and send telemetry to observability backends. If you’ve been to talks, you know how real the vendor agnosticism is. There’s a lot of wonderful cooperation across all of the vendors in the observability space. And, we all can play nicely in the OTel sandbox. Unsurprisingly, OTel is part of the CNCF, and joined back in 2019.

So, one thing I think gets a little bit confused about OpenTelemetry is what it is not. It is not just a tracing tool. Although we did start with tracing … We have expanded into all of the other signals. And, interestingly enough, OTel is not a backend or storage system. It is the pipeline and the set of libraries to generate the data, to transmit them, maybe transform them, and then export them somewhere else. And, it is also not an observability UI. That is why, in this workshop, we needed to bring in a UI for tracing. In this case, I chose Jaeger…That is because it is not the purview of OTel to get into storing this data long term or visualizing it.

Image Alt Text

Installing and configuring OpenTelemetry

Paige Cruz: So, in our case, we’ll be relying on the OpenTelemetry instrumentation Flask library, which is built on the OTel middleware, and we’ll just be observing a very simple web application. The one component that we won’t be using today, but is important to understand overall about OpenTelemetry is the Collector – which is an open source proxy that receives processes, transforms with OTTL, the OpenTelemetry transformation language. 

What I hope you take away from this workshop is that you can do any sort of mixing and matching from automatic to programmatic and manual. It’s not one or the other, one versus the other. You’ll actually probably need to rely on all three types to get the best visibility.

Like I said, automatic instrumentation is great because you don’t have to make code changes.

Now, we will be installing and configuring OpenTelemetry in our demo app. So, what we want to do for automatic instrumenting, is we want OTel to get set up on your machine. We want to configure the SDK, run our demo app and view trace data in the console. We’re going to start building these concepts up one by one.

We will be working with a Python Flask app. I specifically chose Python because we have a really lovely, strong set of documentation in the Python community, the Python agent, and a lot of great code examples. You can kind of fill in the blank with your favorite framework but for today we’ll be doing Python.

I imagine this is where folks may run into some issues as we get to the interactive component. And again, if you run into a snag, raise your hand and me or one of my very helpful helpers will come around and try to debug with you. But, this step I think should be pretty easy. Let’s make a project directory and `cd` into it. You can name it whatever you want, but it’s probably best if you copy and paste. Next, you’re going to want to download the demo app. This is a Python Flask app. It is very simple. It has three endpoints, nothing too fancy there. You can grab a git clone, HTTPS, choose your own adventure. I’ll leave this up just for a little little bit to make sure folks have time to grab it down. 

Now, we want to explore a demo app. What are we going to be instrumenting? What do we need to get visibility into? We’re starting with no instrumentation. We don’t know anything about this app as it’s running. We have three routes that we’re going to look at today. 

  1. We have our “/” route, which will display the count of how many times you’ve loaded that page for that session. 
  2. We have “Doggo,” an endpoint that calls out to the Dog API, and fetches a random photo of a dog. 
  3. And finally, “roll-dice,” which is just going to display the result of a randomized dice roll. 

Nothing too gnarly in the code today, because really what we want to focus on is learning the concepts of tracing, the concepts of instrumenting with traces. So, I really wanted to slim down the complexity. So, once you’ve gotten that sample repo down, go ahead and get into that directory and we will build our first image. Again, I did all of this testing with Podman, so if you do run into a Docker problem, I wish you luck. I may try to help, but this is the way the workshops are set up and this is why I wanted to make it available after the fact. So, you can kind of hack on this at your own leisure. 

Building an image with Podman

Paige Cruz: We’ll start by building an image. Podman builds, and we’re going to tag it. We’re calling this app: hello-otel You’ll know you’re successful when you get a message like below.

[Successfully tagged localhost/hello-otel:auto  \
516c5299a32b68e7a4634ce15d1fd659eed2164ebe945ef1673f7a55630e22c8
]

It will obviously be a different ID. Once you’ve got an image, now we can run this container. Now, you’re going to run this command:

[podman run -i -p 8001:8000 -e FLASK_RUN_PORT=8000 hello-otel:auto opentelemetry-instrument \
  --traces_exporter console \
  --metrics_exporter none \
  --service_name hello-otel \
  flask run --host=0.0.0.0
] 

You can copy paste, but let me just walk through what’s going on. We’ve got port 8000 exposed in the container mapping to our local port on 8001 … So, once you get this running, there’s this little command that snuck in here, the opentelemetry-instrument. That is the component that is going to be doing our auto-instrumentation for us. If you open up the source code, you’ll see there are no OTel libraries being installed. 

We’re totally doing this from the outside with the OTel agent. We are going to export our traces to the console, export some metrics, and then, Flask run is what we give to the application to start it up. Once you’ve got that running, go ahead and open up local host 8001 and confirm that for that “/” endpoint. You see, this webpage has been viewed one time. And note, I am a backend engineer, not a front end. So there’s not a lot of pretty CSS or anything happening.

Image Alt Text

Auto-instrumenting via OpenTelemetry bootstrap

Paige Cruz: So, once we’ve gotten there, we can kind of confirm your setup’s working, app’s working, Podman’s working, we’re ready to move on to the next step. If you confirm that that’s working, go ahead and just stop that container with “CTRL-C.” And now, we’re going to run interactively and use a little tool called OpenTelemetry Bootstrap. And what that will do, is go ahead and detect whatever installed libraries we have in the app. And in this case it’ll see that there’s the Python library for Flask. It will go out to the registry and find relevant instrumentation packages to bring in. This is the magic of auto instrumentation that happens.

Go ahead and run your container, map your ports, run it, run the image that we just built, and then go ahead and make sure that you get into a shell. Once you are in that shell, we’re going to pip install both the OpenTelemetry distro and OpenTelemetry exporter. OTLP is the OpenTelemetry language protocol. And that is what OTel traces and spans from one system to another. 

So, we need both of those things. Pip is what Python uses for package management, and once you’re in there, you can run opentelemetrybootstrap-a install.  That will go out and grab all those dependencies and those instrumented libraries, and you’ll be successful ’cause you should be dropped into that container. 

Well, I guess it depends on how you set up the pod and VM, but if you’re just doing it all vanilla from the start, you should be dropped in as a route. Now, you can run the auto-instrumentation agent, and this is where we’re going to lean on opentelemetry-instrument. Again, this looks very similar to our app run command, right? But, we’re wrapping it in the opentelemetry-instrument agent there. So, we’ve changed our run command. We’ve added this up top and what we should get is some verification.

If that is working, you can go ahead and open up local host 8001, make a couple of requests, generate some traffic, and you should see in your console, spans appearing in textual form … And that is how you know that you have successfully wrapped our Flask app in the OTel, auto-instrumentation agent and are getting span data. This is success at this point in the workshop. I’m going to pause, ’cause I don’t want to get too far ahead…

So, we were interactively in our container. We need to get out of there. So, just type exit, or for some systems you may need to “CTRL-C” out. What we did is confirm that, without making any code changes, we were able to get span data automatically just by wrapping our command with the OTel instrumentation agent. 

Mixing automatic and manual instrumentation

Now, let’s go ahead and add a span attribute. In this case, we want to just see how many times the page has been loaded. Maybe that’s an interesting metric for us to track. So, we’re going to hop into your IDE or your text editor or whatever you’re using to write code you’re most comfortable with. Open up that sample application and find app.py. What we’re gonna do now is manually import the OpenTelemetry library and we’re going to modify the index method that is attached to the slash route for this app. So, the things you’re going to bring in from OpenTelemetry import trace, you’re going to instantiate a tracer. So we’ve gotta make sure we’ve got something that’s tracking all of our spans.

And then, we’re going to drop into the /index method … We’re going to start a new span right now. You should call it something meaningful and relevant to you. So, in this case, “load homepage” works. And then, I always just like to type out fully, we’re going to reference that variable as span. Some people shorten it to “sp”.. The next line you’ll add is span.set_attributes. Attributes are key value pairs. And so, we’ll call this page load dot count and then you’re going to give it the value of hits, which is how many times that page has been loaded.

So, when we’ve done that, we’re going to do this loop many, many times. We will make some code changes, we’ll rebuild our container image and then we’re going to load up our app, send some traffic to it, and then look at our trace data. This is a loop we’ll do over and over again. You’re more than welcome to write a little script, or if you’ve got your Podman VM that’s mounted to your directory, you can do this stuff on the fly. But for the ease of use, we’ll just kind of run through these commands as is. So, go ahead and rebuild your image. Same kind of command as before:

[$ podman build -t hello-otel:auto-manual -f automatic/Buildfile-auto
] 

Make sure you get that success message that things are building. Then, run our container – copy and paste this command:

[podman run -i -p 8001:8000 -e FLASK_RUN_PORT=8000 hello-otel:auto-manual opentelemetry-instrument \
  --traces_exporter console \
  --metrics_exporter none \
  --service_name hello-otel \
  flask run --host=0.0.0.0
] 

Please copy paste liberally, and get this application running. The way that we’re going to verify this manual instrumentation … We should see in our console a span pop up that specifically has this in this attribute block: pageload.count:1 or however many times you refreshed. That is how you know that we’ve successfully piped through that manual attribute there. We will not be working with spans and textual forms for long, I promise. This is just the easiest way for us to constrain the space and get started early on.

Continue this portion of the conversation at 24:15.

Programmatic Instrumentation

Paige Cruz: And again, this is this loop that we’ve been talking about where we’ll rebuild the container image. We’re going to run the container and then we’re going to send some traffic and validate our results. This is the loop you should get used to when you’re instrumenting. So in this case, the only thing I’ve changed is that we’re going to tag this programmatic in case you wanted to kind of head to head the manual versus the programmatic versus the auto instrumentation later. I’ve tagged them differently per lab. Make sure you get the success message that you built it and then go ahead and run.

Only thing that would’ve changed from the last time is this programmatic instrumentation. Everything else is the same. Open up local host 8001, confirm that the app is still running and you can go ahead to /rolldice. Flask knows about routes: So Flask instrumentation will make sure that it tracks all of our calls to the different routes, but it doesn’t know too much more about what we’re doing on the inside there. So you won’t see how many times this webpage has been loaded. But you will see a span for the route that has been called. 

Again, we didn’t change any of our routing code, we just imported the Flask instrumentation library, which is essentially what the auto-instrumentation agent was doing for us. Go ahead and stop that container. 

Traces in Jaeger

Now, we finally get to actually looking at these traces in Jaeger. Podman, like Kubernetes, has a concept of a pod, which is where you’re running multiple containers and they share some resources. What we’re going to do is open up the app, it’s the YAML you know and love, and just make sure you’re comfy with what we’re doing here. We’re bringing in the Jaeger all-in-one container. This is not production ready, this is specifically for local testing, just in case you were wondering. And then, the ports that you need to care about are 16686, for Jaeger’s web UI, 4318 of course is for sending OTel data. And then we’re gonna make sure that the collector OTLP is enabled. So Jaeger knows to receive OTel data and we’re gonna be sending it via OTLP.

So, your other container should be stopped. Just make sure that’s happening. You can always do podman ps, and see what’s running.

The way that you run a pod in podman, podman play kube, and then pass it your pod file. You should get a success message that not one, but two containers have spun up: One for our app and one for Jaeger all-in-one. Jaeger natively supports sending OTel data so the config that we had to do was also pretty minimal. The stuff all plays nicely together. You’ll know you’re successful if you can open up local host 16686, and you will see the very, very cute Jaeger mascot: This little gopher detective, he’s following the footprints, the trace. This is what success looks like for this part and let’s go ahead and generate some traffic.

Make a few requests to localhost 8001 to to /rolldice, and kind of see what you get out of the box in Jaeger. What you should see is the hello-otel, or whatever you have named your service in the dropdown. And you should see some little dots for the request representing each request that you’ve made. So, I made 10 traces. You may make one or two and okay. Jaeger has a lot of really beautiful ways to visualize traces. One common visualization you will see is a trace waterfall. You can go ahead and click. So if you see down here like “hello-otel /doggo.” You can click that and be taken into the trace waterfall view.

This trace waterfall, is similar to the waterfall view in Chrome DevTools. This is a visualization you turn to when you want to examine in detail one particular request that was traced. And you can get all sorts of helpful attributes if you click on one of the spans, it should open up into this nice little table. If all of this was working for you locally, then we have completed programmatic instrumentation, and successfully sent and viewed some traces in Jaeger. Leave your pod running because next up we’re gonna talk about the visualizations and you may want to explore that on your machine.

Continue this portion of the workshop at minute 34:43.

Manually instrumenting metadata

Paige Cruz: So, we reviewed our trace visualizations, we’ve kind of gotten a little more comfortable with the Jaeger UI and now, is the final lab on manually instrumenting metadata. This is the change we’ll make to our instrumentation loop. We’ll make some code changes, we’ll rebuild our image, we will run our container, we will generate traces by sending requests to our app and then we will load them up in Jaeger and see the results of our instrumentation.

So, we’re adding one step to our instrumentation feedback loop, but it’s still pretty manageable. If you were really into the trace comparison, there’s a deep dive blog that I’ve linked here in these slides. You can check out the Jaeger project site or while you’re here go talk to a Jaeger maintainer or talk to the Jaeger folks about what’s going on in their world. You can read about native OTLP support and if you’re really keen on bringing Jaeger to your org, it’s definitely a good idea to look at the deployment options you’ve got available to you. Because all in one is not for production. I cannot say that enough. It is just for our local testing today.

Paige Cruz: Alright, Lab 5. It is kind of a rehash of before, but again, automatic and programmatic instrumentation gets you most of the way to visibility. It is definitely better than nothing. But if you have the skills and can teach other developers the skills of manual instrumentation, you’ll be able to add specific metadata to your apps so that you can derive insights from your business. You will see some examples of what will be manually instrumenting now as we begin. So, we’ll head back to our IDE, open up app.py, kind of delete whatever we had there and go ahead and reset app.py.

Then we are going to manually bring in some libraries. Again, re-initialize our tracer provider, which is what creates the tracer that accesses and modifies span as we run our application. And we did like that Flask framework instrumentation. It was pretty nice to be able to see traces for the, attached to the routes we were making requests to. So we also want to bring back the Flask instrumentation library

And again, this is the loop we know. We will build our image, make sure that we get that tagged. And in this case, the only change is we’re going to tag this as manual just so that you’ve got a separate image   for each:  programmatic, the automatic, and the manual. If you want to do some comparisons later on, you would need to remap some ports to make that work, but that’s pretty doable.

So the other change we’ll make is over in app_pod.yaml, which is our pod spec. Because we have changed our tag for the image we built, we need to make sure that our pod has that updated tag. And then comment out that command block because we’re manually instrumenting now and we don’t need the auto instrumentation agent to wrap our Flask command. Because we’re going to be running the OTLP span exporter and relying on Jaeger’s native OTLP ingestion to sense spans over HTTP.

You can watch the rest of this workshop starting at minute 50:06 here. 

 

Chronosphere Named a Leader in the 2024 Gartner® Magic Quadrant™ for Observability Platforms

Share This: