Editor’s Note: The following article is a companion piece to an excerpt from The Manning Book: Effective Platform Engineering, focused on design choices to consider as you begin scoping out your software-defined platform. In its entirety, this Manning book explores how Platform Engineering practices can dramatically improve operations. This specific excerpt focuses on how to measure the usage of your platform, discover what your customers need, and how to elicit feedback. To read the whole book, skip ahead and download
TL;DR
- Treat incoming platform requests like product feature ideas—not SLA-bound tickets (bugs/outages excepted).
- Instrument real usage: capture telemetry on platform components and anonymous CLI metrics to hear the silent majority.
- Enforce a layered architecture (Handler → Service → Repository → Datastore) with architectural fitness functions so teams can swap datastores/clouds without coupling control-plane logic.
- Make decisions data-driven: use ADRs + test-driven development and fitness-function “gates” so new services ship with tests and observability/monitors by default.
- Platform engineering is software product engineering: measure, test, and iterate continuously to align roadmap with verified developer needs.
At this point, you might be wondering how to identify the opportunities you need to build a platform. To answer that, there are several ways to measure the usage of our platform, discover what our customers need, and elicit feedback directly.
The first method is the most direct and obvious; as a centralized team, you will probably have a ticket system. This fact is unavoidable in most organizations. However, as a product team, your ticket system is treated quite differently from the standard operations team.
In the typical operations workflow, it is assumed that when a ticket request is put in, there is an SLA on when it will be completed. But … there’s an even bigger assumption we just glossed over. And that’s the assumption that it will be done at all!
Think about a DevOps or Identity team: When tickets come into their queue, it’s assumed that most of these requests will be done at some point. This is not the case in the Product operating model of building our platform, because not all these requests will get prioritized.
This highlights the problem with assuming that “DevOps” is a team.
As we mentioned in Chapter 1 [of The Manning Book: Effective Platform Engineering, which can be downloaded here], DevOps should be a culture.
Treat requests as product features, not tickets
When building an Engineering Platform and using the product operating model, it’s important to remember that all requests, except for bugs and outages, are treated as product requests. This means that the team needs to carefully review and analyze each one.
Customers of the platform have chosen to use the platform product, and that means they cannot expect to demand features be made, and certainly not with an SLA!
If teams outside of the platform team were able to demand new changes all the time with an SLA, then our Platform team would slowly decay into being only a DevOps team, and it would lose focus on the self-service features that make our platform a functional product.
Rename the “request queue” to an “idea queue” to set expectations
This doesn’t have to mean that you don’t need a request queue, though. In fact, a queue can turn into your platform backlog! By applying a bit of marketing, instead of calling it a request or demand queue, we might want to call it a Platform “Idea” queue or Platform “Feature Request” queue.
By changing the wording, we change its meaning, and teams will understand that requests can (and will) be denied if they don’t fit within the Platform as determined by the product team building it.
So, how else might we capture feedback and new needs of the platform?
Capture platform usage and developer needs with real data
As you are building the platform at PETech, you realize you need monitoring and observability tools for the customers deploying applications to your platform. You may not realize that you, the Platform team, need those same tools.
Measure usage with telemetry and metrics
Automated measurement of the platform’s usage is a key indicator of how the platform is being used, helping us to know which changes are liked and what new features we should prioritize. We’ll talk much more about measurement in [Chapter 4].
Lastly, you should get feedback from your customer base directly. There are numerous methods to accomplish this.
Instrument CLIs with anonymous metrics to hear the silent majority
When providing CLIs to your customers, you can include anonymous metrics gathering.
Creating touchpoints to close feedback loops
You can send out surveys and conduct 1-1 interviews to find out what features people like, don’t like, and don’t have. It’s important to come back and gather this type of data from your users regularly, and also connect with them.
Weekly demos to build trust, engagement, and steady adoption
Weekly or bi-weekly demos of new platform features help build engagement, trust, and interaction with the platform’s customers. We consider this to be an invaluable component of the platform development process because there is no better way to make your product better – by getting feedback on what you have built so far.
Architectural fitness functions for an Engineering Platform
As we are defining the APIs of the platform, the topic of databases comes up. Many of the APIs we will create for the Control Plane of our platform will need to store their state in a resilient database.
One of the senior developers on our Platform Team at PETech points out that our platform will support many regions, maybe even a Global topology where developers will be interacting with our control plane from multiple continents. So our control plane must be highly available, fast, and replicated across many regions.
So, as a team, you start thinking about globally available database services from your cloud provider. But another team member then points out that we also need an easy-to-use development experience for platform engineers, and a highly distributed global database could make the local development experience very complex.
And then another team member adds that while services like DynamoDB meet our requirements in AWS, we just bought another company, and their entire infrastructure is on GCP. They’ve recently asked us to start exploring supporting the Engineering Platform on their cloud as well!
Decoupling Datastores with Service and Repository Layers
First, how do we reconcile all of these concerns? Let’s return to the fact that we have a software-defined platform.
Good software design includes an architecture that decouples hard dependencies (like databases) and allows for change over time. To handle all the different data needs, we’ve decided to separate the database details from our platform’s APIs.
Enforce clear service, repository, and datastore layers with automated checks
We’re setting up a service layer and a repository layer. The repository layer will use a Datastore interface, which can work with various database technologies. As long as these databases use the same functions, we won’t need to change the code in the repository or service layers at all.
You might have heard about Fitness Functions before. They’re well-explained in many books.
Simply put, an Architecture Fitness Function is a tool that helps objectively measure how well certain aspects of a software’s architecture are performing. This concept is neatly summed up in “Fundamentals of Software Architecture” by Richards and Ford.
Use architectural fitness functions to protect design intent
Then, to ensure we always meet this pattern and keep these layers decoupled, we would write a fitness function that verifies the service layer only ever imports the repository layer, and the repository layer only ever imports an implemented Datastore.
We’d write another fitness function that ensures all of our Datastore implementations adhere to the standard Datastore interface.
Prevent control-plane coupling as the platform evolves
These Fitness functions ensure our control plane API logic doesn’t ever change when we decide to implement a new database, be it local to one developer’s computer or globally distributed.
You can think of them as a sort of Unit test that asserts the architectural patterns and decisions remain intact as you are making changes.
Here we can see that each layer is only consumed by the next, and we enforce this with a fitness function. This makes sure that down the line if we try to skip creating a datastore layer for a new database (choosing instead to call datastore functions from our service layer) our Fitness function will fail, stating that we must use the Repository Layer to interact with our new Datastore.
You can see more examples of this pattern in action in the Github repository for the book.
Another thought that may cross your mind for your platform at PETech is this Fitness Function practice feels a lot like the sort of thing the developers have to do for their applications, writing tests! And you would be right.
Remember, at the start of this excerpt, we said that Platform Products are also software to be developed using a software SDLC, and to build a scalable and successful engineering platform, we have to treat it with Software Principles. This includes fundamental architectural principles, like Fitness Functions, and writing tests that we continuously verify and trust.
Take a look at the repository for Chapter 2 [of The Manning Book: Effective Platform Engineering, which can be downloaded here] to see more examples of engineering platform fitness functions.
Fitness Functions: An Exercise using ADRs, Tests, and Monitoring Gates
At PETech, while we are going to build the Engineering Platform on AWS, we know that the merger with AllTech is pending completion. AllTech is 100% on Google Cloud, and they don’t even have an AWS account.
We know that when we build the platform for PETech, we have to focus on our immediate customers but make architectural decisions that allow us to change and adapt the platform, such as potentially for other clouds in the future. One area of high importance is our custom Platform APIs.
How might we write a fitness function that ensures our platform APIs are implemented in a cloud-agnostic way?
- Using the sample API provided in the (C3 Repo)[Todo, Link] – Write a fitness function that ensures our API keeps cloud-specific features and operations in isolated interfaces that don’t affect our service logic.
- How might we expand this fitness function to work for all of our Platform APIs, not just this one?
After some debate amongst the team, we’ve decided that test-driven development is a rule we want to adopt and use across all of our platform’s custom software.
- Write an Architectural Decision Record that captures this decision. Include reasons, alternatives, and details.
- Write a fitness function that will fail if someone checks in a new Service Layer without tests associated with it.
As we’ve seen throughout this chapter [of the Manning book Effective Platform Engineering, which can be downloaded here], observability and monitoring data are at the core of every decision we make.
Consider how we might write ADRs and Fitness Functions that capture this.
- How might we write a fitness function that would fail if a new API Feature gets checked in without any monitors? Feel free to use a specific observability tool to write your answer and then compare it against the answer in the back for similarities.
- Consider how a data-driven approach might change the dynamics of the team’s interactions with other stakeholders and executives at the company. What tactics can you use to debate the merits of a new feature request using our ADRs, Fitness Functions, and observation-driven decision-making? Consider how these techniques remove emotions and assumptions from these sorts of debates.
Focus on what matters
If you’re wondering how to identify the right platform opportunities, focus on three inputs: usage metrics, articulated customer needs, and direct feedback. With the foundational concepts down for a software-defined platform, it’s time to explore the world of Domain Driven Platform Design. Download the entire book to keep reading.
Frequently Asked Questions
How should a platform team treat incoming requests?
When building an Engineering Platform and using the product operating model, it’s important to remember that all requests, except for bugs and outages, are treated as product requests. This means that the team needs to carefully review and analyze each one.
What are architectural fitness functions (in simple terms)?
Simply put, an Architecture Fitness Function is a tool that helps objectively measure how well certain aspects of a software’s architecture are performing.
Platform Engineering on Kubernetes
Transform your platform engineering strategy with actionable insights and techniques to help you design platforms that are powerful, sustainable, and easy to use.