Every day, we speak with customers and prospects about the challenges they face with log data. Many common patterns arise throughout these conversations. This includes complaints about explosive log data growth and the idea that large amounts of data aren’t useful.
At the end of the day, these conversations are anecdotal. We sought to build a more data-driven understanding of our customers’ challenges.
So, Chronosphere surveyed 127 folks familiar with their organization’s logging strategy. This included decision makers responsible for buying log management and SIEM platforms, as well as users. We also surveyed a mix of observability and security teams. Of the people we surveyed, 83% work at companies with 500 employees or more, and 40% support 5,000 employees or more.
In this article, I’ll share the 6 key takeaways from this survey and provide some analysis into the results.
6 Key Takeaways About Log Data
1. Log data grew 250% year-over-year on average
Across the board, we hear that logs are growing at an explosive rate. To better understand this trend, we asked respondents how quickly their log data had grown in the past 12 months. The answer? On average, respondents reported 250% growth.
Why has log data grown at such an explosive rate? There are a few different reasons, among them:
- Kubernetes adoption: As applications are broken down into smaller components, each interaction generates log data. This results in volumes that far surpass those produced by traditional monolithic architectures.
- Rise of digital experiences: More companies build and run customer-facing software than before. With more digital experiences, organizations inherently create more logs.
- IoT adoption: IoT devices generate vast amounts of log data due to their continuous operation and frequent interactions with networks, sensors, and other connected systems.
Data growth is a challenge, in large part, because log management and security information and event management (SIEM) platforms charge based on the volume of data users ingest. In other words, log management and SIEM costs grow at the same rate as data volume.
2. 22% of organizations create 1TB or more worth of logs each day
To put into perspective how much log data has grown, we also wanted to measure the total volume respondents created each day. Nearly a quarter of respondents reported generated 1TB or more of log data daily. 12% of respondents said they created 10TBs or more each day.
When you combine these numbers with the cost of log management and SIEM platforms ($2-$5 per GB), you find that these organizations spend seven or eight figures for logging. Given the 250% growth rate, it’s no wonder that teams are exploring telemetry pipelines to reduce log data volumes.
3. Fluent Bit is the most commonly used open source agent
Observability and security teams leverage many different log collection mechanisms. Some use vendor-specific agents, such as the Splunk Universal Forwarder and the Datadog Agent. Others opt for open source log shippers to collect data, like Fluent Bit or the OpenTelemetry Collector. Many use a combination of both.
As part of this survey, we asked respondents what tools they use to collect log data. When we filter out proprietary, vendor-specific agents, we found that Fluent Bit is the most commonly used log collection agent (22% of respondents). The OpenTelemetry Collector follows closely behind (18%).
It’s worth noting that each respondent selected more than two log collection agents in their responses. In other words, everyone is managing log collection across disjointed tooling.
4. Organizations struggle to gain useful insights from log data
When we talk to observability and security teams, we hear many different challenges related to log analysis. But, does one challenge outweigh the others? Surprisingly, the answer is clearly “yes.”
38% of respondents struggle to get useful insights from their log data – 12% more than the next ranking challenge.
Thankfully, there are many ways to boost the signal of your log data. For example, you can remove noisy data or contents. You can also enrich log data with additional context from third-party sources, speeding up troubleshooting and investigations.
5. Logs are most valuable in the troubleshooting process
Log data is useful in many different contexts. More teams perform log analysis than ever before – security, observability, and even business intelligence users. Given the wide range of logging use cases, we wanted to better understand where log data is the most valuable.
When asked, respondents reported that logs were the most helpful in troubleshooting production systems (43% of respondents). Incident response was the next most common use case (41%). Logs are also frequently used to monitor application performance and load testing (34%).
6. Are logs increasing in value? It depends if you work in observability or security.
We hear anecdotally the value of log data isn’t growing at the same rate as its volume.
In a separate survey, conducted during a live webinar, over 90% of respondents noted that less than 40% of their log data was actually useful.
To verify whether this sentiment was true, we asked our survey group. Interestingly, responses changed drastically depending on the respondent’s background. 63% of people working in security found that log data was increasing in value. Whereas only 35% of observability users felt the same way.
Security teams might find increased value in log data for a few reasons. First, since logs contain detailed, chronological records of events they are crucial for reconstructing security incidents and performing forensic analysis. Second, security teams rely heavily on logs to identify unusual patterns or behaviors that may indicate a security breach. Lastly, logs are invaluable to meeting compliance and regulatory requirements.
On the flip side, observability teams may find logs decreasing in value because they’re more likely to be overwhelmed by the sheer volume of logs created by modern distributed systems. Plus, they can leverage other types of telemetry (metrics, events, and traces) to support their use cases.
Solving Log Data Challenges with a Telemetry Pipeline
The takeaways from this survey highlight several challenges that organizations face:
- The unprecedented rate of data growth and the impact this has on costs
- The difficulty for deriving useful insights from log data
- The need to manage log collection across different types of agents
Solving these challenges requires organizations to focus on three key areas:
- Cost Optimization: With log volumes growing exponentially, teams need to right-size costs. This effort includes filtering out low-value data, pruning excessive contents, and introducing low-cost storage targets for long-term retention.
- Signal Enhancement: Given that 38% of respondents struggle to derive useful insights from their logs, organizations should prioritize log quality over quantity. To deliver this value, teams can standardize log formats and enrich logs with new context.
- Tool Consolidation: The survey revealed that most organizations use multiple log collection agents. As this trend continues, it will add complexity to logging infrastructure, making it harder for teams to maintain. Teams should seek ways to consolidate log collection and routing to avoid this issue.
By taking these actions, observability and security teams are well positioned to manage ongoing data growth.