Ben: What are some examples of sort of “build vs. buy” decisions that you went into, you know, you thought, okay, we made the right decisions or the wrong ones, you know, what were the outcomes? And what do you think folks can learn from that?
I think probably we’ve discussed M3 and Uber was a big one, but maybe dig a little deeper into that one. If there’s others you want to discuss, we’d love to hear about it.
Rob: There are actually tons of examples of, “if you build it, they, they might not come.” I was responsible for one of those projects before M3, and it was basically a system that essentially optimized a lot of the mobile interactions with the dispatch trip systems in terms of doing real time synchronization of some of the data in there. That was kind of overkill for the time at which the product that we were offering on top of was being built.
It was set up to support Uber Eats and then it was naturally going to take over the dispatching systems. That synchronization technology, though, fascinatingly enough, is core to the success of another [project]. I mean, it’s been rewritten and completely reimagined. The engineer I was working with on that problem has started a project management company called Linear.
I think that’s a project to me [that] showed the fact that even though it’s probably the right long term choice for that problem, it was definitely not the right time or place to build that. People did not come because they had 99 other problems and integrating onto this new framework was, you know, not the top one of them, right? So I think that it’s really important to do that kind of consensus work, even if you’ve kind of already carved out a whole area that you know you need to improve anyway.
The other one that’s kind of interesting to chat about is we had a visualization tool that we put into place that sat alongside Grafana to kind of like show a more consistent observability set of visualizations and insights into the system so that an engineer that was just hired into the company could actually orient themselves rather than have to wade through 10,000 Grafana dashboards — which is a real number of dashboards that Uber had.
I think 8,000 of them were probably not used in the [previous] year. But it was really a difficult thing to understand the complexity of the system, especially with 4,000 microservices, right? So there was this idea that we could present a more consistent view and let people navigate between the systems and services themselves in a fashion that kind of felt familiar to people. Then also it showed the dependencies upstream, downstream, [and] whether those systems were experiencing problems. Because when you get paged on call, the last thing you want to do is essentially work out whether the errors are in a downstream system and you’re just being woken up for someone else’s problem.
Anyway, this project was built, and unfortunately, you know, we started tracking the internal usage of it, and it did not add up to anywhere near the level of Grafana. Grafana had 1,000+ daily unique users of the 2000 strong engineering force, which I think kind of speaks to how important observability is to people’s day jobs in software engineering.
But, this system that was developed to give a more consistent view into things while there were some passionate users is still only Had 50 to 60 users logging in a day and it would have been better to probably build one of those experiences inside of Grafana itself or approach the problem in a different manner, right? Because we built it, then they did not come.