In 2013, Robinhood was founded to offer a commission-free investing app that made trading so easy, it unleashed a new generation of investors who could buy and sell stocks without using or paying a broker. Before partnering with Chronosphere, the company ran observability in-house using Grafana Mimir. The solution was not only expensive, but was also plagued by availability issues and was prone to hours-long downtime episodes.
Over the past two years, the Robinhood has seen popularity of its trading platform skyrocket: Membership reached tens of millions and it saw an 80% increase in month-over-month usage. Their users, who execute trades throughout the day and tee them up the night before, are so passionate about freedom to buy and sell stocks that their expectations of uptime are the same as essential utilities – electricity, water, and the ability to trade should be always-on. “When money and regulatory bodies are involved, the reliability stakes are even higher – we needed to eliminate all barriers for customers to trade on our platform,” said a Senior Staff Engineer who also founded the observability practice.
For Robinhood, the challenge became meeting reliability and performance demand from its rapidly growing user base. The company was paying millions of dollars per year on their open-source monitoring product with a Mimir backend. Yet Robinhood was plagued by outages, especially at the “Sev1” level, meaning the production system has stopped operating and there is no workaround. There were several issues and outcomes related to outages:
The resulting downtime was a huge disruption for enthusiastic day traders. “When Mimir went down, it was several hours before it came back up,” said the Senior Staff Engineer. “We can’t win over customer trust with a system that doesn’t offer high availability, durability and performance.”
Robinhood required a solution that wasn’t overly-complicated, was cost-effective, and which would guarantee at least 99.9% uptime of observability services, and improved dashboard loading speeds. “Availability guarantees are essential. Cost was also a factor since we had been spending several millions on our previous monitoring product,” said the Senior Staff Engineer.
Robinhood began the search for a highly available solution that could scale alongside their business and was compatible with open source standards. Well-known SaaS application and infrastructure monitoring products had been evaluated and ruled out due to cost, reliability, performance, and vendor lock-in, especially where it concerned dependency on the vendor for custom integrations.
While the company briefly considered running observability in-house with a different tech stack, they ultimately decided SaaS was a better approach for the business. “SaaS frees up engineers from the on-call onus. You’re not playing ‘whack-a-mole’ with services and with underlying structure. You’re on the applications on top of your metrics, improving libraries, metrics adoption… it frees you up to focus on the bigger picture,” said the Senior Staff Engineer.
Chronosphere quickly rose to the top of the list, with several capabilities hitting Robinhood’s requirements head-on:
High availability and reliability. The Chronosphere observability platform was built from the ground-up for cloud-native scale and complexity, which means greater reliability. Chronosphere is 5X more reliable than alternative SaaS monitoring solutions and has never missed a customer SLA – a fact that was vitally important to Robinhood given their previous challenges with availability.
Fast remediation. The faster the engineering teams know there is a problem, the faster they can start to remediate it. With Chronosphere, Robinhood was able to reduce their MTTD issues by 4x, from 2 minutes to 30 seconds. Once the engineers are alerted, they can also load dashboards and reports much faster – dashboards that previously took 15 minutes to load, now load in seconds. On top of that, Chronosphere reduced “time to glass” – which describes the time from when a data point is generated to when it is visible in dashboards and reports – from 45 seconds to 5 seconds.
Compatible with open-source standards: Unlike other SaaS monitoring offerings, Chronosphere is open-source compliant, supporting all major open source metrics ingest protocols, dashboards, and query languages. Because Chronosphere is built on the open source M3 metrics engine, Robinhood wouldn’t need to rely on the black box magic of monitoring a vendor’s proprietary data format, it avoided vendor lock-in, and it was able to leverage existing Prometheus investments.
Cloud-native observability expertise: Chronopshere’s co-founders Martin Mao (CEO) and Rob Skillington (CTO) previously ran the observability team at Uber where they experienced first-hand the challenges of running large-scale observability for cloud-native environments.
Ability to keep up with the business: With Chronosphere, Robinhood found an observability partner who could keep up with their rapidly scaling business. With Chronosphere, Robinhood is able to unlock new insights that were previously unavailable due to longer retention periods and faster load times. “After seven days, data isn’t usable. The fact that we now see data retention in excess of two years with Chronosphere is huge for us.”
Learn more about Chronosphere and see it live in a 1:1 demo by scheduling a meeting with our expert team.