Location: Virtual | SREcon 2021
As companies transition to cloud-native architectures, the volume of metrics data being produced is growing exponentially and SRE teams are being forced to adapt to these increased demands, including finding ways to limit or control the cardinality of metrics. As this growth continues, it’s critical that cloud-native companies (and their SRE teams) find ways to manage this growth sustainably and reliably.
During this session, Rob will discuss some best practices and tips for efficiently taming metrics data growth and cardinality at scale. He will also share some proven at scale KPIs and metrics to keep in mind when running, maintaining, and growing a world-class observability function. Focusing on real-life examples from leaders and engineers across the observability space, the audience will leave with a better understanding of how to implement these learnings with their existing SRE resources, including some ways for tracking and measuring these efforts.
Rob Skillington is the Co-Founder and CTO of Chronosphere. He was previously at Uber, where he was the technical lead of the observability team and creator of M3DB, the time-series database at the core of M3. He has worked in both very large (Microsoft), medium (Uber) and small teams. As of 2023, Rob is based in Melbourne, Australia with his family of four after 10 years spent living in Seattle, San Francisco and New York City.