Whether your organization is small and just getting started in your market, or a large enterprise embarking on your cloud native journey, Prometheus is a great place to start for metrics-based monitoring. As an open source toolkit, Prometheus works out-of-the-box and can be initially run in-house and by a smaller team. Later, as your environment grows, and the inability to scale becomes a problem, it’s seamless to switch to a hosted Prometheus-compatible observability platform that is purpose-built for cloud native.
This was the case with Tecton, which is on a mission to standardize feature management in machine learning applications. When Tecton first launched, the team was using an in-house Prometheus setup to monitor its environment. As Tecton’s customer base grew, so did the scale of its environment, and its open-source monitoring system started breaking down. The company partnered with Prometheus-compatible Chronosphere once teams were spending more time firefighting monitoring issues than innovating.
Since their setup was instrumented from open-source Prometheus, the move over to Chronosphere was simple.
Challenge: Tecton’s volume struggle
Even as a relatively small startup, Tecton’s monitoring system was still a source of major pain. The team was spending too much time on monitoring.
A few issues with Tecton’s Prometheus set-up included:
- Each instance of Prometheus was completely siloed
- It was impossible to get a global view of the entire Tecton environment
- During incidents, the on-call engineer had to switch between isolated Prometheus deployments, which made it time consuming for them to find the relevant metrics that could help them troubleshoot the issue
- Any change to Prometheus had to be made to each deployment, and silencing a noisy alert was a manual change
- No long-term storage
- Tecton was only able to store metrics data for about a week, and metrics would then drop out of memory
- If a customer brought up an incident that happened more than a week prior, there was little Tecton could do to investigate
- On top of that, the persistent volumes Tecton was using for storage weren’t reliable
- The system kept breaking
- Their Prometheus setup was buggy, and the team had to drop key tasks to focus on fixing metrics problems
After building a custom way to get multi-regional support for Prometheus, the Tecton team realized they had outgrown their in-house system and it was time to make a change. “We said, this is not the right way to do it,” Trivedi said, a software engineer at Tecton. “We’re basically building what we know is technical debt.”
Solution: The search for a new solution with availability
Tecton couldn’t properly meet business expectations by operating off of a broken monitoring system. The team was spending an inordinate amount of engineering time building custom features just to make Prometheus work, while also wasting time putting out monitoring fires.
The team had three key criteria as they set out on their search for a new monitoring solution. They required:
- A global view across regions and customers
- Control over data retention and the ability to keep historical data
- High availability
Choosing Chronosphere: meeting customer expectations and avoiding technical debt
Tecton needed an out-of-the-box monitoring solution that wouldn’t break, and a feature set that wouldn’t need custom engineering work.
After comparing proof of concepts with other monitoring solutions, Tecton chose Chronosphere for two key reasons:
- Confidence in commitment to support
- Cost savings compared to the current approach
Key results
The biggest benefit of moving to Chronosphere is having an out-of-the-box monitoring solution that doesn’t break all the time and has a full feature set that doesn’t need custom engineering work.
Chronosphere’s cloud native observability platform provided Tecton with:
- Less burdensome on-call rotations, reducing noisy alerts and alert fatigue
- Useful data retention
With Chronosphere, Tecton no longer needs to think about monitoring as the teams spin customer deployments up and down. Instead of spending hours urgently debugging the monitoring system, teams spend their time focusing on Tecton’s core product and delivering on customer commitments.
Learn how our solution makes all the difference
You can explore more on Tecton’s journey into the right observability solution in this Tecton case study.