Smart infrastructure vendor Nebulon is one of those startups that turned heads when it launched its product to lighthouse customers two years ago. At the time, the global pandemic and economic concerns dominated headlines, but Nebulon’s smart infrastructure platform caught attention because it solved an emerging need: companies want the benefits of a cloud experience and cost savings while keeping their data on-premises and in their control.
When Nebulon partnered with Chronosphere instead of taking an in-house, open source observability approach, the company’s most daunting challenge—ensuring its cloud-based control plane system is highly available—became its easiest to solve.
The challenge: deliver exceptional customer experiences and highly available services
While much of today’s AWS cloud infrastructure runs on Nitro-based servers for efficiency and cost optimization, enterprises and service providers still rely on hyper-converged infrastructure for their server-based data centers, sacrificing density, at-scale management, and adequate cyber-resilience capabilities.
Part of Nebulon’s smart infrastructure solution consists of a hardware DPU device, Nebulon SPU, that connects to solid state drives in on-premises servers and transforms them into a shared storage pool with a full complement of enterprise data services including data deduplication, compression, encryption, erasure-coding, snapshots, and mirroring. With this device installed in an organizations server fleet, storage, and key server resources can be controlled from the Nebulon cloud UI so users can:
- Fleet-manage the deployment of server storage software/firmware in minutes
- Centralize automation and control at cloud-scale through a single API
- Remotely monitor fleet-wide health, physical asset inventory, and configuration
The UI is part of the heartbeat of Nebulon’s solution, a centralized SaaS control plane—Nebulon ON—that shifts the processing of infrastructure services from the server to the SPU, and traditional data center operations tasks to the cloud. As the face of the customer experience, Nebulon ON needed to be in good behavioral health and reliable. Nebulon ON answers questions such as: Are the storage and compute resources behaving properly? Are there errors or anomalies? Are the latencies appropriate? Where are surplus capacity and performance resources available to launch a new project?
Nebulon ON is “the cloud-based control plane that provides a view of device status, collects a variety of telemetry from Nebulon SPUs about performance and utilization, and authenticates devices to perform various operations,” said Nebulon’s VP of engineering and cloud lead, Michael Heyeck.
The right observability solution would be key to Nebulon’s ability to provide mission-critical information to customers about their server-storage infrastructure. “If there’s a problem, we need to quickly determine who’s the right person to look at it,” said Heyeck. “The priority is making sure our SaaS console is always available.”
Nebulon chose Chronosphere to solve control plane availability
Nebulon started using Chronosphere for observability well before announcing its cloud operating platform in June 2020. The company originally used Chronosphere during product development for device metrics. “If there’s a problem, cloud engineers were able to use metrics to figure out what went wrong based on what the time series shows,” said Heyeck. In fact, today the vast majority of Nebulon’s volume of metrics comes from devices.
However, the way Nebulon uses Chronosphere eventually expanded to include the operational side of its SaaS console as that capability came to fruition. “We’ve been on Chronosphere for the entire lifetime of our SaaS console,” said Heyeck.
Before choosing Chronosphere, Nebulon considered a build-it-yourself approach. Heyeck’s team evaluated technologies such as InfluxDB and Victoria Metrics. But those options were quickly ruled out due to performance issues. “We just kept running these things out of memory based on the telemetry load we were pushing.”
More importantly, the resources required for a DIY approach were a non-starter. “It takes a certain number of talented people to set up and operate a large-ish time series database,” said Heyeck. He estimates three full-time employees would be needed to run an in-house observability solution properly. He noted that some organizations might try dedicating half an employee to operating an in-house observability solution, but that would severely risk instability.
Control plane availability was a key driver behind why Nebulon chose Chronosphere for observability. Chronosphere provides:
- Metrics that Nebulon is able to expose to the customer through dashboards.
- Alerts that fire if the system deviates from its expected behavior.
- The ability to combine metrics along with additional logs to find out what’s wrong and remediate the problem quickly.
“The core metric of our uptime is whether the system is working properly,” said Heyeck. “Chronosphere helps increase that number, by providing a reliable and quick service.”
Download the Full PDF
See Chronosphere in Action
Learn more about Chronosphere and see it live in a 1:1 demo by scheduling a meeting with our expert team.