Give events a chance; knowing what changed is essential to identifying and resolving problems.
In a meeting last year with a bunch of senior observability leaders from cloud native companies, I asked everyone to tell me their least favorite telemetry type: metrics, events, logs, traces, or whatever. I was pretty confident the dominant answer would be logs. Nothing against logs, but I had recently heard this group express the hot take “during an incident, if you’ve gone to the logs, you’ve already failed.”
I was wrong. To my surprise, they answered almost unanimously: events. Events were the most despised telemetry type. I followed up by asking, why do you dislike events so much? Again the answer was nearly unanimous: The lack of definition about what they are and how you can use them.
I get it. In researching events, I’ve found four or five different definitions, and no one seems to have nailed down the best way to use them in a troubleshooting workflow.
Since that meeting, our team has spent a lot of time thinking about events and how we can make them useful as a first-class telemetry citizen. The team did extensive research and then got to work building a function to track change events. Just recently, we announced the ability to ingest events in our observability platform.
I want to step back and explore why events are so critical and how they can help.