Talking costs, developer productivity with The New Stack Makers

Blog

In this transcript of a conversation between Chronosphere CEO Martin Mao and The New Stack Makers Heather Joslyn, learn about the latest observability trends and how Chronosphere is helping organizations maximize their return on investment with their observability tools.

On: Dec 27, 2023

22 MINS READ

As modern businesses move to cloud native architecture to support larger, more scalable computing environments, they’re facing more than just a technology change. Cloud native and multi-cloud adoption brings about questions – and concerns – regarding costs, developer productivity, data ingestion, and open source tooling usage

Recently, Chronosphere CEO and Co-founder Martin Mao joined Heather Joslyn on The New Stack Makers podcast to chat about these trends, what’s happening in the observability industry, and how Chronosphere is helping organizations get the best return on investment (ROI) with their observability tools.

If you don’t have time to watch the full video, check out the chat transcript below:

Chronosphere and cloud native

Heather Joslyn: Hello everyone, and welcome to another episode of The New Stack Makers. I’m your host, Heather Joslyn, features editor of The New Stack. Today we’re gonna explore what’s new in cloud native observability. Observability is complicated in a cloud and multi-cloud environment and can get even more complicated as an organization scales. How do you help a team prioritize what matters most when they get alerts or when they find something that needs attention? And how are people using observability to help them shift left and take more responsibility at the developer stage for security? So this discussion will be joined by Martin Mao, CEO and Co-founder of Chronosphere. Hi Martin.

Martin Mao: Hi Heather.

Heather: Martin, can you tell our listeners just a little bit about Chronosphere and what it does?

Martin: A hundred percent. Chronosphere provides a cloud native observability platform. You can imagine something that you use to gain visibility and insights into your infrastructure into all of your applications or your microservices and in your tech stack and even the business in real time. We focus predominantly on companies that are either cloud native already or moving to cloud native and have the pleasure of working with companies such as DoorDash, Snapchat, Robinhood, and Zillow. We help them gain visibility insights into their technology stack and the infrastructure and ultimately provide a better experience for their end customers.

The cost of cloud native observability

Heather: Thanks. Let’s get right into it; briefly to help our audience understand the situation. What are the biggest challenges organizations face with observability when they’re running on the cloud or several clouds?

Martin: Yeah. You quote a couple of them, but there’s really two main challenges here. The first of which is cost. You can imagine the current macroeconomics, the cost of tools and infrastructure is really top of mind for a lot of companies out there. And it’s not just the cost, it’s really [the] cost correlated because of the increase in the volume of data that gets produced as a company moves towards a multi-cloud or a cloud native environment. On average, that increase of data is about 12.4x. You can imagine correlated with that is almost an order of magnitude increase in the cost as well. That’s one really big problem. Solving this observability problem is just a lot more expensive in cloud native environments.

On the other hand, the other problem that we’re dealing with is the outcomes that we’re trying to achieve. The amount of incidents we have is actually on the rise. And we’re seeing that companies are experiencing probably about a 79% increase in customer-facing incidents. And the development team is actually spending about a quarter of their time — 10 hours a week — debugging issues. So you can imagine it’s a pretty bad formula here, where the tooling and the tools that we use cost a lot more, and they’re actually less effective at the job that they’re trying to do. You can imagine, if you think about it from a ROI equation perspective, both sides of that equation are getting worse. And that’s actually a really big issue for the industry right now.

Heather: And some of the customers that you mentioned, for example, I mean, they’re, they’re dealing with a lot of data, real time data. And 10 hours a week debugging, that’s, that’s a lot of time not to be creating new features.

Martin: Yeah, a hundred percent. It’s a, it’s a quarter of a week that’s not doing something that’s moving the business forward. And again, in the current macroeconomic conditions where there’s a lot of focus on developer productivity and doing more with less, you can imagine that a quarter of their time is something that really should be optimized as much as possible, because there’s a lot of efficiencies to be gained by just freeing up developer time.

Observability adoption requires organizational change

Heather: Speaking of developers, what impact does [cloud or cloud native adoption] have specifically on a company’s operations on not just the technology, but the teams as well?

Martin: It’s not just a technology stack change and a tooling change. Often when a company goes multi-cloud or adopts a cloud native architecture, there’s often an organizational change as well. We adopt more of a DevOps mentality where the developers are responsible for operating their piece of software in production. You can imagine, we generally organize ourselves structurally to match the architecture. So if I’m in charge of this part of the system, I’m in a team that is in charge of this part of the system. So there’s a lot of organizational change that comes along with this as well. And one of the biggest friction points is that developers have not had to operate their software in production [before], there’s been a centralized operations team.

And that’s just not how organizations structure themselves these days. You may have a (site reliability engineering (SRE) team or a platform engineering team, but those teams are not really responsible for operating software. They’re responsible for setting up the frameworks and the tools and providing the best practices. But each developer really has to take that responsibility into their own hands. And that sort of provides new challenges because you can imagine a lot of the tools that exist today were really built for centralized operations teams. They weren’t really built for the average developer.

And the average developer doesn’t just have to write the piece of software now. They have to know how to operate it and monitor it in production. They have to know how to secure it. They have to know how to deploy it and, and learn everything about continuous integration/continuous delivery (CI/CD). There’s so much that’s placed on the developer now that I think they’re not going to become experts individually at any of these things. And I think that puts a lot of pressure on the software and the tools that we need to provide, that they really have to be optimized for the developer, and they really need to be simplified for what the developer needs to do, because the developer is not gonna be an expert in observability and security, necessarily.

Heather: What you’re describing, a lot of that is the shift left movement to put more responsibility on the, on the developer beginning of the process.

Martin: A hundred percent. Right? You can imagine with that more responsibility, they need help. They need help with the tools, and it’s somewhat unfair to say, “Hey, you also have to do all of this stuff on top of creating the software, and we’re not gonna give you any help there.”

Heather: So how can observability help teams shift left? How can it help them handle that responsibility?

Martin: From our perspective, in this migration to a cloud native architecture, one of the things is that the architecture actually gets far more complex. So not only do we have developers who have perhaps never operated in production before, now they’re operating in a far more complex environment here. I’d say one of the roles of observability is to ensure that we can actually help developers run, maintain, and operate their software in production. But one of the things that we’re finding is that observability can play a really good role in helping decode the fairly advanced environment that a developer is running in. So actually leveraging observability tooling and the data to understand, okay “where does my piece of software run? How does it run? What are its actual dependencies?”

Because that’s often something that you may not know: What are the impacts of those dependencies? There’s actually a lot that the observability tooling can be doing to sort of inform the developer of all of that information in ideally a really simple way. And you can imagine that can both help them operate the piece of software reliably, but also help them sort of understand and navigate the complex architecture that they’re now operating, that they’ve never had to operate before.

The technology change behind rising costs

Heather: I wanna circle back a little bit to the cost issue. You mentioned that [cloud native] can be expensive. Why can it be so expensive?

Martin: I guess, maybe a good or a bad thing. It’s not because vendors are getting greedy. Let’s put that away. It’s not because some folks made a decision to charge more for this stuff. it’s actually [that] in this particular shift, there’s so much more observability data being produced and that’s really driving up the cost of these systems. It’s one of the — I don’t know if it’s unintended — consequences, but it’s definitely one of the negative consequences of moving over to cloud native architecture. If you think about a pre-cloud native architecture, you’re running an application on a virtual machine (VM), and perhaps you pay a cloud provider for that particular VM, right? If you’re drifting over to a cloud native architecture, you still have to pay the cloud provider for that VM. You still get the same amount of compute and storage, so the cost of infrastructure is around the same.

However, now, you probably have a lot of containers running on that VM and your application is broken up into microservices. So, the cost of the infrastructure is roughly the same. The cost or the amount of workload you can put through that infrastructure is around the same, and I have the same amount of compute, right? So all of those variables remain the same. However, in observability, you now need to understand and observe every container in every tiny microservice, and there’s just 20 of them where there was one before. It’s one of those weird things where, you know, everything else remains relatively static but yet the volume of data that gets produced from an observability perspective increases around 12.4x and that’s driving the cost of observability tooling a lot.

That’s actually causing a lot of issues, not just because it’s more expensive, but it’s having really bad unintended consequences. Like developers being forced to instrument less and to understand less about the systems because they can’t afford to do so. Or perhaps companies are only monitoring some of their environments or part of their environments. And it’s actually sort of counterintuitive, to what observability is really there for, right? It’s “I want more visibility, but the bill and the cost is actually preventing me from getting that visibility and preventing me from being effective.” It’s actually the cause of a pretty big problem that the industry has right now.

Accelerating developer efficiency with Chronosphere

Heather: So we want to know what’s going on, but knowing what’s going on can be expensive, right? I mean, as you’re getting all this information, and [need to know] how to sort through it, because yeah, not everything is an anomaly. It’s dangerous, right?

Martin: A hundred percent. That’s the other side of the coin. Even if you can pay for the information, it’s too much raw information. Especially for the end users of these tools that are developers that have potentially not operated [them] before. So all of a sudden you put a person who is not very experienced in doing this and you flood them with a lot of data. That’s just a bad combination in terms of how effective someone can be out there. You can imagine that’s why not only does it cost so much, but the outcomes are getting worse. The return on the cost that you put in is actually getting worse at the same time. It’s a pretty bad dynamic the industry is in right now.

Heather: What, what do you see, what do you see as some of the things that Chronosphere is doing in this area? Which of these things really needs attention?

Martin: The two parts of that equation that I was talking about is really where we are focused on Chronosphere. On the cost side, it’s really helping companies deal with the cost and in particular, deal with this huge increase in data. Part of fixing that problem is just making these tools cheaper. Yes, that is part of the solution. But the other part of the solution is there’s just too much data and not all of it is valuable. But the real problem is it’s really hard to determine what is valuable and what isn’t valuable until something happens. When you are producing the data, you don’t really know what’s gonna be valuable or not gonna be valuable.

Because of that, we actually created not just a product and a feature set for it. We actually created a vendor neutral framework for how teams should think about this. We call it the Observability Data Optimization Cycle. It’s a bit of a mouthful [but] if you want to find information about it, you can find it on our website. But essentially, it’s taking a lot of the concepts of a FinOps function, a function that has gotten good at understanding the financial impact and optimizing it, [then] taking a lot of those concepts and applying it to observability. So you can imagine the first thing you need is probably centralized governance and, and a way to create budgets for individual teams or departments there.

Then you need a way to actually measure the cost, like how much observability data is each team using? It’s not just the company [overall] but who’s using it, right? Then [you] need all of the utilization data, like how is this data being used? What parts of it are useful? [Then] sort of assign value to how much you are paying for it. You ideally need the tools that can match the value and the cost, because it’s not just about dropping the data on the ground, because that’s actually a bad outcome. You really want to perhaps optimize all of your data, thus that you can get the value out of it, but without paying the whole cost. So we created this framework for companies to think about it and really apply a lot of the FinOps concepts to the observability space.

And then, of course, in the product, there’s a whole feature set that goes and does all of these things for you. If you want an out of the box product, we obviously have that. So that’s what we are doing to solve the cost problem. On the other hand, [we have] a few announcements that we have coming up in the next few months that really focus on making the experience and the tooling much more simple for the everyday developer. This where they are being flooded with data. And what they really want is access to insights — and just insights — contextualized presented to them. That’s what a developer really wants. They don’t actually wanna wade through all of the raw data.

Things that I’ll say what we’re gonna be announcing pretty soon. I don’t wanna ruin a surprise, but, they’re coming out soon and focused on that part of the problem. This is really, you know, for Chronosphere, this is the problem in the industry we have to solve: The cost problem and we have to make these tools more effective. It sort of gets the ROI back in balance for companies out there.

Growing support for open source observability

Heather: Last fall, I interviewed your co-founder Rob Skillington about PromLens, which was donated to the Prometheus project. Is there anything new happening with Prometheus on projects like open Telemetry?

Martin: Yeah. There’s definitely a lot of interesting things happening in the open source world. You referenced PromLens; that was part of our way of helping [aid developer productivity]. PromLens was a project we contributed to, which helped developers visually build a query to access a lot of this data, right?

So already a year ago, we were trying to make the experience better for developers in the open source world. One of the big things happening in Prometheus —I believe will be talked about at the PromCon conference coming up in a couple of weeks — is the announcement of a project called Perses. Perses is an open source, Linux Foundation owned visualization tier for observability data.

In the industry, the standard for a while here has been Grafana. However, the company that owned it decided to change the licensing model. We see this for a lot of projects out there more recently. So [Perses] is a new project that’s vendor neutral and owned by the Linux Foundation already. It’s gonna be a visualization tier, and ideally, eventually the de facto visualization tier, uh, on top of a lot of observability data. I believe it is going to be announced in the coming weeks here. It’s already an open source project, so I’m not ruining a surprise, but I believe that’s gonna be announced soon. It’s a project that Chronosphere has been contributing to for the past year and a half to really try to move the open source industry forward on the OpenTelemetry side of things.

That’s on the Prometheus side. For distributed trace data in particular I see a lot of work on instrumentation and auto instrumentation. I think the industry is all in agreement that a new standard OpenTelemetry is great for companies out there. It means that they’re not locked into a vendor anymore, and they can produce the data in one format, and send it to whoever they want, and whichever tool they want. Everybody is really happy about that. However, practically, when it comes to actually instrumenting, it turns out that it’s fairly painful to manually instrument every application with OpenTelemetry. I see a lot of development and innovation in the instrumentation side of things. That’s an area that we’re keeping a close eye on and working with various companies on as well. But yeah, that’s an update on some things that are happening in open source right now.

Happenings with Chronosphere

Heather: Is there anything else new happening at Chronosphere or do you feel [we] covered it?

Martin: At Chronosphere, we talked about the types of problem areas that we’re trying to solve. As I mentioned there, there’s a couple of pretty big announcements coming up in the next couple of months. I don’t wanna ruin the surprise here, but there’s definitely a couple of big announcements we’ve been working on, really focused on the developer and making them more effective. I’ll say the best way for folks to have a look at [this], we’re actually gonna be at KubeCon, the Kubernetes conference, as well as re:Invent this year. So feel free to come find us at those two conferences, and we’ll be there showing you in the product itself how we’re really taking a fairly new and differentiated approach to solving the problem for developers and really making the value of observability much higher than what it is today, and really trying to solve this problem for cloud native environments and and multi-cloud environments.

Looking ahead at observability industry trends

Heather: We’ve mentioned the Prometheus project and others. What trends are you seeing in the broader observability space that you’d like to draw people’s attention to?

Martin: As for three trends we’re seeing, some are related to open source, some are related to the problems that we’re seeing. We’re definitely seeing one trend: Everybody’s now all of a sudden paying attention to the cost of observability tools. This is something that we, we’ve known for a while for the last four and a half years, and we experienced, when my co-founder and I solved this problem at Uber; we knew cost was going to be a problem. It took the rest of the industry, perhaps a few years more to realize that. We see a lot of trend into cost reduction in observability through various tools and mechanisms out there. But that, I would say, is one thing that’s top of mind. If you talk to most folks focused on observability, cost is probably the number one thing on their mind right now.

So there’s definitely one trend, try to get more optimized there. The second trend is what we sort of talked about just then, you know, as we shift over to a cloud native architecture, the old way of solving a problem with application performance monitoring (APM) tools is just no longer relevant. And we need a completely different approach. And this is where open telemetry and distributed tracing, um, comes in, in, in handy. So we are seeing sort of a shift over to a new set of methodologies and approaches to solving the problem that are much more suited for cloud native environments. And I think that’s reflective of just the broader industry going over to cloud native as opposed to, you know, staying on the same type of architecture that we had before. So that’s another trend that we’re seeing is everybody’s starting to pay attention that old APM tools are no longer effective in new environments. [Companies] need to adopt new approaches and perhaps new tools to solve these problems. So that’s probably the second trend.

The third trend we touched upon already is open source standards like Prometheus and OpenTelemetry continue becoming the default and gaining more traction. I think that’s just great as an industry because we are moving away from vendor specific proprietary instrumentation where a company is really locked into a vendor and the vendor data. And I say this as a vendor myself: That’s a really bad state for a company to be in.

[Moving to] more of this open source standards world where there isn’t just agreement on the standards, that’s great, but actually means that companies own the production of their own observability data — and you’re not locked into a vendor. It means you can really take your data with you to whichever tool and whichever vendor, or if you wanna build it yourself, that’s an option. I think that freedom is something that unfortunately the industry has not seen in the last few years. And I think that’s a really positive movement as well.

So those are probably the big three trends that we’re seeing. The first two are really solving cost and effectiveness problems; the third trend is really all about open source. So we just so happen to hit all three trends in our conversation thus far. But that’s really where, from my perspective, at least, where the industry is trending.

Heather: That seems like a good place to wrap up. I’d like to thank our guest, Martin Mao, of Chronosphere for joining us today. Thank you, Martin.

Martin: Thank you for having me, Heather. I really enjoyed the conversation.

Interested in learning more about Chronosphere? Request a demo today.

Recent News

Featured Resources

Talking costs, developer productivity with The New Stack Makers

Chronosphere and cloud native

The cost of cloud native observability

Observability adoption requires organizational change

The technology change behind rising costs

Accelerating developer efficiency with Chronosphere

Growing support for open source observability

Happenings with Chronosphere

Looking ahead at observability industry trends

Share This:

Table Of Contents

Featured Resources:

Chronosphere Named a Leader:

Table Of Contents

Related Posts