Building a software-defined engineering platform

Top view of hands using a calculator, laptop, and tablet on a wooden desk, partially covered by a green graphic with an icon of hands holding a gear and circuit lines—perfect for showcasing software-defined engineering.
ACF Image Blog

Learn how to apply best practices to create the foundational parts of an engineering platform

Ajay Chankramath, wearing glasses, a light blue shirt, and a dark blazer, stands poised against a light gray wall.
Ajay Chankramath | Chief Technology Officer & Managing Director, Platform & Products, at Brillio

Ajay Chankramath is the Chief Technology Officer & Managing Director, Platform & Products, at Brillio. With over 30 years of industry experience, he is a proven technology visionary known for leading transformational initiatives in platform engineering. A recognized thought leader, Ajay frequently speaks at global technology conferences and has authored influential pieces on platform engineering strategies. Additionally, he co-holds a foundational patent in platform engineering, solidifying his role as an innovator in the field from its early development stages.

Nic Cheneweth, a smiling man with short dark hair and glasses, dons a dark suit with a white shirt as he stands outdoors before a fountain and stone building.
Nic Cheneweth | Principal Consultant at ThoughtWorks and founding infrastructure contributor to ThoughWorks Digital Platform Strategy

Nic Cheneweth is a Principal Consultant at ThoughtWorks, and is the founding infrastructure contributor to ThoughWorks Digital Platform Strategy. His undergraduate studies are in computer science and software engineering, and he holds an MBA as well as doctorate and post-doctorate degrees. With 30 years of executive leadership, consulting, and engineering experience in roles ranging from the courtroom to the boardroom, as a former CEO, VP, Chief Counsel, Director, or entrepreneur in startup, private, and publicly traded companies, Nic brings a unique perspective to technology strategy and implementation.

Bryan Oliver, a person with glasses, a beard, and short hair, smiles warmly while wearing a dark suit jacket and a light shirt.
Bryan Oliver | Platform Engineering Team at Thoughtworks

Bryan Oliver is an experienced engineer and leader who designs and builds distributed systems. He currently resides on the Platform Engineering team at Thoughtworks, where he focuses on cloud native platforms. He enjoys contributing to open source and speaking at technical conferences internationally.

Sean Alvarez, a distinguished man with glasses and gray hair, dons a dark suit and tie. He offers a warm smile against a plain background.
Sean Alvarez | Principal consultant at Thoughtworks where he is the Head of Business

Sean Alvarez is a principal consultant at Thoughtworks where he is the Head of Business Platforms in North America. Using skills learned while getting an M.S. in computer science and an MBA he has led multiple enterprise scale transformations using the principles of Platform Engineering across all cloud vendors, and can be recognized from his industry presentations and roundtables in the practice.

11 MINS READ

Applying platform engineering practices

In the next few blogs, we will dive into applying platform engineering practices to create the foundational parts of an engineering platform for an imaginary company. These foundational components are crucial for any effective platform.

You’ll see how:

  • Platform domains relate to the pipelines we build
  • We can extend the Kubernetes control plane for more value
  • And what a self-serve user experience should look like

Fictional engineering platform example

VitalSigns.online is a fictional healthcare tech startup based in North America. They offer various mobile and web services that help people track their health vitals and share this info with their doctors when paired with consumer electronic devices. Their web services have always been open to third-party developers and business partners, and they’re keen to keep and grow this feature. 

VitalSigns plans to roll out even more services in the coming years. They’re focusing on building their tech as APIs, aiming to create individual health-data collection apps and combined experiences quickly. This approach is about providing doctors with better data and helping people achieve better health outcomes without frequent office visits. 

The company has seen incredible success and growth in the four years since it started and expects to have over a hundred developers soon. But without a clear strategy beyond a mix of tech silos and a DevOps team, developers at VitalSigns are now spending half their time on lead-time planning, coordinating with other teams to get DNS entries, firewall rules, storage, compute capacity, monitors, alerts, pipeline changes, and everything else needed to build, deploy, and operate their software, often under tight deadlines. 

Maintenance and operational issues are a constant headache and aren’t seen as adding much value. Unsurprisingly, product incidents are rising, leading to frustrated customers and higher support costs.

Imagine we are a team within VitalSigns tasked with creating a better solution to these challenges.

Diagram showing internal development teams accessing the VitalSigns software-defined engineering platform via SDK and API for developer tools, infrastructure & operations, and governance.

We want to create an internal product that provides a genuinely self-service experience where developers (our internal customers) can imagine, design, build, release, and operate their applications with agility, high engineering quality, greater operational resiliency, and confidence in meeting compliance requirements, yet without all the usual engineering friction they usually experience. In other words, we will deliver an Engineering Platform.

Start building an example engineering platform

Let’s assume this will be a brand-new product. While this is a common approach for learning exercises, there are plenty of good reasons for an enterprise to consider doing the same when building an engineering platform. The most pragmatic reason is described best by Gall’s Law:

A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system. 

John Gall (1975) Systemantics: How Systems Really Work and How They Fail (p. 71

Engineering platforms comprise multiple technologies

By now, nearly every organization has been using many, if not all, of the various technologies that will go into an engineering platform. They have teams scattered everywhere using Kubernetes, creating infrastructure using Terraform, using Git, and “doing DevOps.” 

These implementations are often difficult to adopt because they’re either too focused on the needs and preferences of a small group of users, or they were developed without user input and are instead optimized for the needs of the team providing the technology.

In other cases, while the implementation is meant for general use, it is managed across many different traditional IT silos where API access and self-serve experiences were not originally considered and by teams without the resources or experience to evolve. Changes need a lot of planning, and long delays are normal. In all these scenarios, user experience is rarely among the top priorities. 

Organizations trying to create a developer platform while sticking to a traditional IT or their original DevOps model spend a lot of time and money, only to realize they are not achieving their goals and have to start over with a unified team in a greenfield setting.

 

A diagram lists components of a software-defined engineering platform product, organized into sections such as Cloud Administrative Identity, Cloud Account Baseline, and Managed Control Plane Services.

The engineering platform product domains show the basic dependencies and almost exactly the order in which we’ll set up the product infrastructure pipelines. This diagram shows the pipelines we will build in each domain to create the foundation of our engineering platform.

But first, let’s talk about the prerequisites.

Prerequisites to getting started

NOTE: Our example platform will be built using AWS as the cloud infrastructure provider. Be conscious of the cost as you start these platform-building exercises. With careful management, such as de-provisioning resources, clearing data storage when not in active use, limiting the work to a single platform environment, and other similar strategies, you may be able to stay within the AWS free tier, but it can be challenging. 

A Kubernetes cluster supporting a service mesh and other platform technologies requires instances larger than micro. Presently, even at the small scale of a personal platform, fully sustaining the various resources of a two-environment platform 24/7 can run from $600-800/mo or higher. Because these are learning exercises, you don’t need 24/7 uptimes, and you can get these costs dramatically lower using the aforementioned careful management. 

Still, potentially significant costs may be involved, and consideration must be made at the start regarding access to the necessary resources. If you are applying these principles at your place of work in the actual delivery of an engineering platform, then the cost has already been budgeted. Alternatively, many organizations fund limited use of cloud resources for skills improvement and learning, so you may have the necessary access to cloud provider services through your employment or educational institution. If you do not have access to cloud provider resources, many aspects of the engineering practices within the exercises can be explored locally using tools such as Minikube. 

What resources do we need?

 

Icons representing key components of a software-defined engineering platform: cloud provider, version control, Terraform state file location, secrets store, and CI/CD tool—using AWS, GitHub, Terraform, OpenID, and CircleCI.

 

The figure above shows resources we will need to begin building our engineering platform product foundation.

Tools included in the platform

Besides the cloud accounts where the platform infrastructure resides, you will notice that these tools are also tools the platform product includes for use by platform customers. As platform engineers, we will use these tools ourselves to deliver the platform. 

Not every potential developer tool is needed initially, so which tools are needed to bootstrap an engineering platform effectively? 

  • Source code version control 
  • Secrets store 
  • Infrastructure state store 
  • Pipeline orchestrator 

Starting from scratch, we have a bootstrap challenge. How can we deploy the first configuration in a software-defined manner using source control, managed secrets, state store, and a pipeline if we must first deploy these tools before using them? 

Right away, you can see the accelerating impact of using SAAS tools as the solution to the bootstrap challenge. The reference code examples in the Effective Platform Engineering GitHub organization will demonstrate several highly effective tools. 

  • GitHub 
  • 1Password 
  • Terraform
  • Cloud CircleCI 

Using these tools is not required to apply the principles in this book, though it will allow you to get the most from the sample exercise solutions.

Getting started with the example tools

In an Enterprise setting, each tool would typically be integrated using an SSO solution. We will talk more about how that fits into the product experience in the Effective Platform Engineering book section on Customer Identity Provider (you can download the entire book here.) 

But for now, let’s go ahead and set up access to the initial tools as an individual user. The respective tool’s product documentation provides detailed instructions for performing the following steps. 

If you do not already have one, create a free personal account on github.com. Then, create a free-tier GitHub Organization for our imaginary VitalSigns company, in which we and all the VitalSigns platform developers will be members. Create a GitHub team called platform-team to represent our product delivery team for the VitalSigns exercises. 

Later, we will use GitHub Teams and team membership to manage access permissions. Add yourself to this team. Create a personal access token (PAT) and be sure to upload your personal SSH keys, and enable support for signed commits.

Create a free-tier Terraform Cloud organization. Go to the settings area (from the left-hand menu) within the organization, create a team, and add yourself to the team. From the team settings page, also generate a team API token. Create a free-tier CircleCI organization and link it with our GitHub organization. Generate an access token. 

Lastly, 1Password offers individual plans for less than $3 per month. Create a dedicated 1Password vault for these exercises then go to the Developer options and generate a service account credential with read and write permissions

Or, if you are using some other secrets management tool, also have the access credentials for pipeline usage available. Store all of the above access tokens in this vault. 

Most of the SAAS tools used in the example exercise solutions offer a free tier adequate to cover the exercises in the book or very affordable personal options, and alternative tools will often be discussed. 

Occasionally, we will use alternative tools to demonstrate the differences among effective choices within the example exercise solutions. 

For any tool where you need an access token to use in automation, such as in our pipelines, if the tool doesn’t allow you to create something like a team or organizational level token, where anyone on your team can manage it, then you are left with needing to create a personal access token. This introduces a problem. 

What if you leave the company or take on a new role with a different team? 

Either of these events can cause your personal access token to be revoked, and any automation that depends on the token will break. The two most effective ways of dealing with this situation are: 

Service Accounts. Sometimes called machine users, these are identities created within the appropriate system in the same way as human users, except that no single person has control over the identity. It becomes a team resource with the username and password stored in the Team secrets store for management. Often, systems provide an actual feature designed to support this type of User. We will use an example in AWS later in a later excerpt blog. Just for these exercises, creating and using a personal token when needed is fine. 

OpenID Connect Tokens. Many tools and most cloud resources, such as AWS, also provide a means of establishing direct trust between other tools or cloud resources. This approach requires more behind-the-scenes automation to create a self-service experience for Platform users, but it is an option. 

 

Next time we’ll cover: Developer tools selection criteria

Manning Book: Effective Platform Engineering

Learn actionable insights and techniques to help you design platforms that are powerful, sustainable, and easy to use.

Share This: