Automation using Control planes vs. Command-line tools

Sitemap## ITNEXT

In my post about GitOps, I mentioned evolving the client-side “kubectl rolling-update” to the server-side Kubernetes Deployment controller /API. Different projects have made different choices whether to implement a command-line-based tool or control-plane-based automation, and projects have changed their decisions over time.

For example, Docker Swarm started as a libswarm library, then a CLI-based feature IIRC, then a control plane, whereas Kubernetes was always control-plane-based, but had features that moved from the CLI to the control plane. Helm v1 was a command-line tool, Helm v2 introduced the Tiller server, and Helm v3 removed Tiller, but GitOps Operators like FluxCD and ArgoCD add back control-plane functionality. Terraform is a command-line tool, but a number of services, such as those with catalogs, wrap control planes around it. Other projects that provision infrastructure, like Crossplane and Radius, put control planes at their cores.

This got me thinking about the tradeoffs between client-side and server-side implementations and the criteria regarding which to choose. Maybe it seems obvious, but evidently it isn’t always.

Back around 2018 when Crossplane and Google Cloud’s Config Connector got started, Custom Resource Definitions were fairly new and the Operator pattern was building momentum. By 2020–2021, there was a fair bit of discussion about using Kubernetes as a universal control plane (1, 2, 3, 4, 5, 6, 7, 8, 9). Some people are using Kubernetes that way for some things, but not everyone is doing all automation using Operators.

The main advantage of client-side tools is fairly obvious: they’re simple. They are relatively simple to write, install, authenticate, run, individually upgrade, extend via plugins, and embed into CI/CD pipelines. Server-side implementations are more difficult in all of those areas.

So when is it worth bothering with a control-plane-based implementation for automation?

You want/need continuous operation, or really long-running operations, like hours, days, or longer. Command-line tools have to be invoked. Servers run continuously. It’s possible to run command-line tools in an infinite loop, but they aren’t designed for that.
If you need fault tolerance, resilience, and/or decently high availability while operations run, that suggests a server-side implementation is what you need.
If the automation needs to react to spontaneous changes in the resources under management, such as in the case of autoscaling, that suggests you need a service.
The automation needs to orchestrate operations on large numbers of entities. This is a case that probably includes all of the previous needs, and also may need to be parallelized and adaptive, as in the retail edge case I mentioned in the GitOps post.
You want/need an API. It’s easier to build robust, observable higher-level automation controllers on top of APIs than on scripts, pipelines, and log files. Pulumi’s Automation API is a good example. It’s possible to create local APIs in client-side components, but it makes access through multiple programming languages and from some environments (e.g., web browsers, serverless functions) harder.
You want/need to access the functionality through multiple user-interface surfaces, such as GUIs and LLM chat bots, in addition to a CLI. Continuously updated status dashboards and multi-player experiences are enabled by services. Accessing the functionality through multiple Infrastructure as Code tools, such as Terraform, Pulumi, and Crossplane, also becomes more straightforward. The Terraform provider plugin API is kind of a de facto standard now, but it’s not a fully open standard, it’s harder to integrate with programming languages other than Go, and it wasn’t designed to be consumed by a variety of clients, which creates integration friction.
You want/need more control over what users can and can’t do. Client tools use end-user credentials or service accounts accessible to those users, so effectively those users would have to be directly granted permissions to do what they need to do. Services can proxy operations by running with elevated privileges relative to end users and then impose role-based access control, state-based constraints, quotas, and other controls to constrain what users can do with their power.
You want to encapsulate the implementation, such as to not directly expose backend systems, for security, privacy, reliability, or ease of evolution. Similar to the previous point, but a slightly different motivation.
You want to shift the operational burden from users to service operators: telemetry, upgrades, troubleshooting, etc. This can be challenging when lots of users are individually running instances of command-line tools of different versions in their own environments.

There may be some other motivations also, but those are some common ones I see.

Given the complexity, it makes sense to start with a command-line implementation and only graduate to a control-plane-based implementation when truly warranted.

Do you prefer to use automation tools that are just CLIs or that require control planes? Does it make a difference if the control plane is a SaaS rather than a system you have to run and manage yourself? Do you view control planes differently than per-node agents?

Feel free to reply here, or send me a message on LinkedIn or X/Twitter, where I plan to crosspost this.

You may be interested in other posts in my Infrastructure as Code or Kubernetes series.

CTO of ConfigHub. Original lead architect of Kubernetes and its declarative model. Former Tech Lead of Google Cloud's API standards, SDK, CLI, and IaC.

More from Brian Grant and ITNEXT¶

Recommended from Medium¶

[

See more recommendations

](https://medium.com/?source=post_page---read_next_recirc--66f818ff8278---------------------------------------)