Terraform is an incredibly powerful tool for managing Infrastructure as Code (IaC). (In fact, we’ve already made note of how we’re using Terraform to manage our passwords & secrets at Wildfire.) The core application can perform dozens of common tasks out of the box and a rich module registry provides thousands of more tasks allowing DevSecOps teams to spend more time affecting change and less time debugging APIs for their vendors.
One frustration Terraform adopters can run into is the significant difference between example configuration code and real world application.
Often, examples from online tutorials focus on extremely simple use cases that, frankly, don’t rise to a level of complexity that would even require an IaC tool.
Once a team has invested enough energy into adopting Terraform, it is common to run across more complex issues where help is sparse, at best. We have discovered this at Wildfire when implementing Terraform for managing certain aspects of our white-label cashback rewards platform.
In this first post of an ongoing series on advanced Terraform topics, we’re going to cover some methods that can help you to keep your Terraform code DRY (Don’t Repeat Yourself) while also remaining flexible.
One of our development team’s internal mottos at Wildfire is that things should “feel pro” when discussed or used. Our team has consistently remarked that all these suggestions fit that bill.
To give you some real-life examples, we have published a GitHub repository showing all the code included in this post.. There are three tags:
Hopefully this post can inspire you to find new efficiencies within your own codebase!
While powerful and improving all the time, HCL is far from a perfect tool for writing reusable and templated code.
Here’s an example that one might find on an online tutorial for creating a Google Cloud Platform service account:
resource "google_service_account" "example-service-account" {
account_id = “example-sa"
display_name = "Service Account Example"
project = “wildfire-cloud-project”
}
This example is a concise, concrete example of how to create a Service Account in GCP via Terraform.
However, a more representative example of resource complexity would be to look at a Kubernetes deployment. In any situation other than a simple Nginx server, these get incredibly long as shown below; this resource takes nearly 200 lines of code!
resource "kubernetes_deployment" "realisticExample" {
metadata {
name = "realistic-example"
labels = {
run = "realistic-example"
}
}
spec {
replicas = 2
selector {
match_labels = {
run = "realistic-example"
}
}
template {
metadata {
annotations = {
}
labels = {
run = "realistic-example"
}
}
spec {
automount_service_account_token = false
enable_service_links = false
container {
image = "path.to.container.image:tag"
name = "realistic-example"
args = []
command = []
env {
name = "VAR_1"
value_from {
secret_key_ref {
key = "VAR_1"
name = "secret-env"
optional = false
}
}
}
env {
name = "VAR_2"
value_from {
secret_key_ref {
key = "VAR_2"
name = "secret-env"
optional = false
}
}
}
env {
name = "VAR_3"
value_from {
secret_key_ref {
key = "VAR_3"
name = "secret-env"
optional = false
}
}
}
env {
name = "VAR_4"
value_from {
secret_key_ref {
key = "VAR_4"
name = "secret-env"
optional = false
}
}
}
port {
container_port = 80
protocol = "TCP"
}
readiness_probe {
failure_threshold = 3
initial_delay_seconds = 10
period_seconds = 10
success_threshold = 1
timeout_seconds = 1
http_get {
path = "/"
port = 80
scheme = "HTTP"
}
}
resources {
requests = {
"cpu" = "100m"
"memory" = "100M"
}
limits = {
"cpu" = "500m"
"memory" = "300M"
}
}
volume_mount {
mount_path = "/mount/path"
name = "volume-1"
read_only = true
}
volume_mount {
mount_path = "/mount/path2"
name = "volume-2"
read_only = true
}
}
container {
name = "sidecar1"
command = []
image = "sidecar.image:tag"
env {
name = "SIDECAR_VAR_1"
value_from {
secret_key_ref {
key = "SIDECAR_VAR_1"
name = "secret-env"
optional = false
}
}
}
env {
name = "SIDECAR_VAR_2"
value_from {
secret_key_ref {
key = "SIDECAR_VAR_2"
name = "secret-env"
optional = false
}
}
}
env {
name = "SIDECAR_VAR_3"
value_from {
secret_key_ref {
key = "SIDECAR_VAR_3"
name = "secret-env"
optional = false
}
}
}
volume_mount {
mount_path = "/sidecar/mount/path"
name = "sidecar-volume"
read_only = true
}
}
volume {
name = "volume-1"
secret {
default_mode = "0644"
optional = false
secret_name = "secret-volume-1"
}
}
volume {
name = "volume-2"
secret {
default_mode = "0644"
optional = false
secret_name = "secret-volume-1"
}
}
volume {
name = "sidecar-volume"
secret {
default_mode = "0644"
optional = false
secret_name = "secret-volume-3"
}
}
}
}
}
}
This example itself is still reasonably simple. It injects four custom secrets into the environment, exposes a port, creates a readiness probe and resource requests/limits and mounts two volumes.
Finally, a sidecar container is configured to run alongside the pod, which is a common way to handle connection to GCP CloudSQL instances. It is not uncommon for a complex system to need several additional volumes and a dozen or more additional secrets.
Now, imagine a microservices-based setup where you are running approximately 100 Kubernetes workloads that are each set up similarly, along with their accompanying services, ingresses and autoscaling.
That leads to easily having 25-30k lines of code very quickly all for a single cluster. If you have a development, staging and production cluster, you are nearly at 100k lines of code, much of it duplicated.
In our example repository, the first commit shows how quickly this naive approach can cause your codebase to grow in size and complexity. Managing only sixteen items per environment, the completely unoptimized version requires more than 3000 lines of HCL:
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
HCL 34 325 4 3162
Markdown 1 7 0 20
YAML 1 0 0 7
-------------------------------------------------------------------------------
SUM: 36 332 4 3189
-------------------------------------------------------------------------------
Let’s cover some key ways that the DevSecOps team at Wildfire manages to keep our Terraform code simple, readable and DRY.
Wildfire DevSecOps strongly believes in the “cattle, not pets” approach to resource management. Rather than trying to constantly nurse along bespoke, unique objects, we strive to keep our various clusters as closely aligned as possible, while still presenting sufficient flexibility for development teams to try experiments on our various white-label rewards platform’s tools, and vet changes before they hit production.
Terraform Workspaces provide a way to separate state for different environments while sharing significant portions of the configuration. This is not a new concept and should be reasonably familiar if you are leveraging Terraform Cloud to do runs.
One concept that is common when using workspaces is interpolating the current workspace name in resource labels: ${terraform.workspace}.
At Wildfire, we’ve taken this a step further, inspired by Chris Pisano’s excellent series at Capital One and created a local environment object storing many common values that change only at the workspace level.
Take a look at this small block of Terraform for managing two environments, which is very nearly identical to our live version, just with simplified database references:
locals {
env = {
default = {
cluster = "dev"
db_uri = “DEV_DB_URI_V4"
db_replica_uri = “DEV_REPLICA_URI_V4"
prefix = "dev-"
project = “wildfire-cloud-project"
}
prod = {
cluster = "prod"
db_uri = "DB_URI"
db_replica_uri = "REPLICA_URI"
prefix = ""
}
}
environmentvars = contains(keys(local.env), terraform.workspace) ? terraform.workspace : "default"
workspace = merge(local.env["default"], local.env[local.environmentvars])
}
So what exactly is going on here? We have created a Terraform variable called local.env with keys equivalent to each of our environments. From there, we create a variable called local.workspace which uses our base environment (default in our case) and merges that with the env found by looking up the current Terraform Workspace in local.env.
This provides a standard baseline configuration and allows each environment to override specific settings. If a setting does not exist for the current environment, it inherits it from the baseline. In this example, our project is inherited, as these two clusters live in the same GCP project (we’re working on splitting that up).
The power comes from utilizing these values in Terraform configuration. Wildfire has a naming convention that any deployment running on our development cluster has a name prefix of "dev-" while prod has no extra prefix.
So, whenever we need to reference a deployment named "deviceapi" we can do so by calling it "${local.workspace["prefix"]}deviceapi" and on our development cluster that will result in "dev-deviceapi" and on production it will be "deviceapi".
It is easy to see how powerful something this simple can be as you expand to things like database URIs where every deployment needs to align to the current database secret reference. Now any time a new database is deployed, the DevSecOps team can change one line and get all our deployments updated instantly. Now that “feels pro.”
This technique is fairly similar to using workspace-aware .tfvars files, but we landed on an in-memory object due to the ability to have all the values in one place, combined with the inheritance of baseline values. This eliminates several mental hurdles when configuring multiple variables simultaneously.
Now, for the results. Let’s take a look at the size of the codebase with workspace-aware configuration:
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
HCL 19 167 2 1635
Markdown 1 7 0 20
YAML 1 0 0 7
-------------------------------------------------------------------------------
SUM: 21 174 2 1662
-------------------------------------------------------------------------------
As you can see, applying this technique results in a codebase of nearly half the previous size, while retaining the flexibility to configure variables consistently across environments.
Additionally, this method ensures a much tighter integration between configuration on the various environments, leading to fewer setup-level bugs when promoting development work to production.
The first round improvements are great, but we can do better!
As more resources with similar configurations are added, it becomes clear that even having a single Terraform file per resource is going to be frustrating. Configuring the same values across many resources leads to either subtle unexpected differences or tedium on the part of developers who would rather be solving hard problems than copy/pasting established solutions.
This push for less duplication naturally leads to templating conversations, but HCL is not the easiest language to template.
The Wildfire DevSecOps team explored many potential solutions, including the Terraform CDK , switching to another tool such as Pulumi and leveraged our relationships with other organizations in a search for the solution to templating our Terraform codebase.
After much consideration, we landed on JSON as a configuration tool. By leveraging Terraform’s ability to merge JSON objects together, we can build a rich, configurable, resource management codebase with as little duplication as possible.
JSON-as-configuration begins with a base definition for the type of resource to be deployed containing the default values for any configurable variable that should usually be in sync across all resources. For instance, our repository has a default of one replica for deployments. Any newly created deployment resource that does not specify its replica count will receive this value.
From there, the root of the resource is also merged with any environment-specific overrides that may exist. If you look at the structure of the ingresses directory, you will see three ingress configurations, two of which provide overrides for dev/prod configuration, while one remains static across both environments.
This presents a powerful configuration setup allowing resources to be tested with unique changes per-environment, while still sharing the majority of the codebase.
ingresses
├── dev
│ ├── one.json
│ └── three.json
├── four.json
├── one.json
├── prod
│ ├── one.json
│ └── three.json
└── three.json
Another neat trick you may have noticed in the deployments directory is that instead of listing out secrets for database configuration, we have just set db: true. Then, in locals.tf, this is converted into a set of environment variables needed to grant access to the database inside the cluster.
In this case, we can trade ten lines for one!
These JSON objects are consumed by locals.tf to build a robust set of inputs for each remaining Terraform file. Our resource types each define a single resource called "all" which then utilizes a for_all loop to iterate over the collection of resource definitions to build our final system.
Within those resources, optional configuration pieces are defined as dynamic blocks that are included or excluded based on the current object being processed.
All these improvements start to add up. With the initial setup, we were managing 32 objects across our two environments with nearly 3200 lines of code.
With DRYing principles applied, we have shrunk our codebase by almost 65%, all without losing any flexibility to configure the resources!
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
JSON 19 0 0 713
HCL 7 44 13 374
Markdown 1 7 0 20
YAML 1 0 0 7
-------------------------------------------------------------------------------
SUM: 28 51 13 1114
-------------------------------------------------------------------------------
As you review the codebase, you may be thinking "hey, I thought they said they'd focus more on complex examples!" It is true that many of the deployments in the example repository share similar setups and could be further extracted. However, in practice, our team has found that these are the fields most commonly varied across our white-label rewards platform, giving us the most flexibility within the Terraform configuration.
The Wildfire DevSecOps team has leveraged this set of DRY principles to import and manage over 1000 resources across Kubernetes and GitHub.
In total, we have about 10,000 lines in our Terraform codebases today and building new resources for the rewards platform development teams we support is drop-dead simple: create a new JSON file under the appropriate resource type directory, configure the proper variables and push a pull request to the team.
We no longer have to worry about making sure we copy the “current” best practices within our organization, as those are enforced globally at the resource-type level. It certainly “feels pro.”
If you think finding efficiency and security improvements in development environments is valuable, come join our team.