The hidden costs of self-managing TSDBs

A self-managing businessman using a calculator on a desk while considering hidden costs.

Blog

Organizations might think it’s more cost effective to self-manage time series databases for observability. Learn why that might not necessarily be the case.

6 MINS READ

When determining whether they should “build or buy” their cloud native observability, some organizations are tempted to build in-house solutions. Why? Leading companies assume that running their own observability will be less expensive when they see the amount of data cloud native environments generate and vendor licensing structures. Especially when open source tools are free and you don’t have to pay someone else, right?

What many companies fail to realize is that there are massive hidden costs associated with running open source observability in-house, especially at scale without expertise, or adequate staffing. You might think you’re saving at the bottom line, as you’re not paying for software licensing, but there are organizational “soft costs” that can be more expensive in the long run.

While the “build” option makes sense in some cases, and can be a great way to start with cloud native observability, as it makes the migration to a vendor much easier down the line. It’s not a “set and forget” type of implementation. In-house, open source observability projects require regular evaluation, system configuration, and dedicated oversight to run efficiently.

Whether it’s the skills required to run a time series database (TSDB) or resources to build out specific features, there are challenges to consider before making the buy vs. build decision.

The difficulty of TSDB management

Why is running observability in-house so expensive? It comes down to people. Generally, the most important and complex piece of an observability stack is the time series database (TSDB). While there are several popular TSDBs that are available in open-source, such as Prometheus, it lacks desirable functionality for a critical service that will run at high scales.

This extends to long-term storage options for Prometheus, such as Thanos/Cortex. It falls upon teams to either implement additional capabilities, or forgo them and deal with additional management headaches that arise as a result. These feature gaps and resultant management headaches may not seem like that big of an issue individually, but they can be a real drag on productivity, and easily turn into “death by a thousand cuts” for an observability team that owns the system’s operation and other day-to-day tasks.

TSDBs have all of the same considerations that another database would in a production environment; administrative operations can be difficult to do effectively given the huge volume of data that ends up being stored in many production TSDBs. Because observability is a critical function for businesses, your TSDB must operate at high availability to ensure it is at least as available as the production systems it enables the organization to observe. Once again, achieving this level of resiliency will fall upon the team, and if they are not experts in the implemented technology, it can be a time-consuming task, and is frequently marked by painful mistakes that result in downtime along the way.

Functional limitations

Open source, while it does have low initial expenses, comes with functional limitations with respect to areas like data management, UI/UX, and customer support functions. Without the right internal resources, organizations will either have to hire talent to build out capabilities or pay for software with such features.

Overall, a vendor can provide the following for managed observability:

Scale constraints automatically and help protect users from the effects of disruptive workloads.
Ensure high availability, oversee backups, and disaster recovery.
Monitor the monitoring system and confirm it runs as intended.
Manage security measures and audit logs.
Include additional high-level management features that are not present in open-source, such as detailed visibility into the data that is being sent/stored.
Provide support and training to end-users to ensure they have the visibility they need into their applications and systems.

Beyond overall system management, observability vendors can provide:

Data management and format support: Managing data ingestion and retention can be quite complex for open-source TSDBs. Managing functions like downsampling of historical data can prove to be quite a headache for teams that are not experienced with the TSDB in question and how to proactively monitor all of its background operations effectively. Some open-source options may lack such features entirely, which makes storing/querying historical data significantly more expensive, and limits teams’ ability to monitor long-term trends. You should also consider what data format support your teams need, as open-source TSDBs may not be compatible with more than one format, or have limited data compatibility. In contrast, vendor managed solutions commonly provide support for multiple data formats, and any challenges with managing ingestion and storage of data are handled for you.

UI/UX: Most open-source observability offerings and TSDBs have minimal UI support for management. Instead, your team will spend their time defining things like alerts via code only, without tools to help ensure they will behave as intended. This setup option does encourage configuration-as-code, but it’s a pretty bad user experience, and the friction that individual users experience in day-to-day tasks can add up very quickly. They require a certain level of system knowledge that your team might not have or must take extra time to learn.

Observability vendors can provide a better UX with proprietary dashboards that are less code-based and more visual. These managed offerings provide interactive query-building capabilities that simplify dashboard editing. Commercial observability programs have interfaces specifically designed to display alerts and events. Plus, user access management is streamlined, and easier to get an overview at a glance.

Staffing for DIY, open source implementations

Open source options for DIY implementation do require in-house expertise and can have a larger learning curve than most commercial off-the-shelf offerings. If your organization doesn’t have current staff to build out a TSDB or existing experience running a TSDB like Prometheus, you will likely have to hire more staff, or contract out to an organization that can help you. This extends to engineers that are end-users as well; which ensures they are able to be productive using the platform in their day-to-day tasks is critical, as poor workflows can be a massive drag on productivity and overall team satisfaction.

In contrast, using a vendor solution will almost completely remove the management burden, and vastly simplify the remaining tasks for admins, such as user management. Additionally, most vendors will provide customer support, training, and onboarding assistance for teams. This means that your own staff can hand off any issues with the platform to the vendor support team, and focus on their actual day-to-day jobs and value-add features for your business.

It’s also incredibly time-consuming to oversee observability systems. Working with a vendor can remove that cognitive load on your observability team and ensure they aren’t hit with the burnout of in-house environment management, which could result in high staff turnover.

Cost isn’t just dollars and cents

The benefits of building monitoring in-house might seem appealing at first, but it’s not something that should be taken lightly – as there are a lot more considerations in the long run that may end up costing you more than the initial savings.

You may see initial cost savings due to reduced licensing costs, but over the long term, effective upkeep requires investment in staff talent, management tools, and service partners to keep systems reliable. It’s also something that requires time and doesn’t fit every business case.

Interested in how Chronosphere can help you manage your TSDBs?