Before partnering with Chronosphere to solve observability challenges, the Abnormal team ruled out several monitoring alternatives, including running Thanos themselves in-house or another SaaS solutions like Grafana Labs. Cost savings – engineering and infrastructure – was a key driver behind why Abnormal chose Chronosphere for observability:
Visibility and control over usage: With Chronosphere, Abnormal gained visibility into how the observability system is being used, as well as gained control over how it behaves when it reaches its limits. No longer would small code changes cause the metrics system to slow down or crash.
Flexibility in metric retention: With Chronosphere, Abnormal can make easy adjustments to retention times, such as choosing both the time interval as well as the retention time. For example, Abnormal could choose to roll up its metrics in five-minute increments for six months.
Data aggregation: Chronosphere’s unique control plane allowed Abnormal to aggregate 98% of their metrics, which resulted in it being 10x more cost-effective than alternative SaaS and self-managed options. By doing so, Abnormal aligns their metrics data to the business value. “The most compelling feature Chronosphere offered is the data point aggregation. This helps us reduce the cardinality that we don’t need and only store the data that is critical to us. That was the differentiating factor that helped us save costs in the long run, “ said Yoshida.
Reduced management overhead: Abnormal decreased the number of times engineers and admins had to work on their internal solution prior to Chronosphere, freeing them up to work on problems that drive their business. “We knew if we were to build it ourselves, we would have to fund a dedicated team. This was a non-starter because at the time we were really focused on using engineering time to tackle other issues, like expanding our customer base and continuing our growth,” said Yoshida.
Open source compatible: Abnormal needed a SaaS solution that can natively ingest Prometheus so it wouldn’t have to change any of its instrumentation. “We needed to figure out a way to move quickly, which meant not having to do a lot of engineering work to ingest. We didn’t want to spend time rewriting our alerts or rewriting the actual code around our metrics,” said Yoshida.