We hope you enjoyed WTF is SRE 2022! In case you missed the event or would like to re-watch our session:
Martin Mao, Chronosphere Co-founder and CEO
MTTR has long been an essential failure metric. However, in a cloud native world, P95 and P99 have become more meaningful measurements. And time to remediation -not repair- is most important. During the talk, Martin will share an alternative to MTTR and how it can become your new P99 of remediation.
Mean time to repair (MTTR) has long been an essential failure metric measuring the average time it takes to repair or restore a system to functionality. But why, in the age of microservices and containers, are we still using a metric with its origins in measuring equipment failures within factories? Mean, or average, is no longer a relevant metric for most organizations, with P95 and P99 becoming the more meaningful measurement. Repair, or sometimes restore, is also problematic. In most cases the most important time period to measure is the time to remediation, or the time to alleviate customer pain, restoring the service to acceptable levels of availability and performance. In this session, Martin will introduce an alternative to MTTR, and share real-life examples and lessons learned to explain how this new way of thinking can become your new P99 of remediation time.
Martin is a technologist with a history of solving problems at the largest scale in the world and is passionate about helping enterprises use cloud native observability and open source technologies to succeed on their cloud native journey. He’s now the Co-Founder & CEO of Chronosphere, a Series C startup with $255M in funding, backed by Greylock, Lux Capital, General Atlantic, Addition, and Founders Fund. He was previously at Uber, where he led the development and SRE teams that created and operated M3. Previously, he worked at AWS, Microsoft, and Google. He and his family are based in the Seattle area, and he enjoys playing soccer and eating meat pies in his spare time.