Wildfire Blog

Deploying Secrets: Or, How I Learned to Stop Worrying and Embrace the Tooling

Written by Kyle Kurz | Jan 21, 2022 4:13:40 PM

Introduction

Secret management is hard. There, I said it. Leaked secrets in Slack channels, shared Google documents or accidental commits to a codebase are commonplace, daily occurrences at many organizations. So much so that managing the storage and sharing of company secrets is often ignored until a catastrophic event, such as a data breach or malicious activity related to a leaked passphrase.

Modern organizations support many types of users: HR administrators, developers and marketing teams, and Wildfire is no different. All of these users not only manage their own private secrets, but they also manage sensitive data that may be shared within the team or the entire organization. On top of that, technology companies often maintain secrets that are never meant for human consumption, often referred to as machine secrets.

Without a single source of truth for all company secrets, the management, security and rotation of these sensitive items is at the hands of individuals. When those secrets are integral to the organization's livelihood, such as API keys or certificates for TLS connections, their protection becomes paramount.

A Holistic Approach

Many companies have moved to an IT-managed password system for "human" secrets using tools such as 1Password, Dashlane or Bitwarden. This solves half of the problem by providing the organization with a secure way to store and access human secrets, as well as a strong recovery method for when a user is forgetful and can no longer access their passwords.

However, developers still need a way to maintain security on machine secrets such that the right humans have access to them in development environments or deployment plans. Often, this has led to a duplication of items, both in a password manager and another system such as Hashicorp Vault, AWS Secrets Manager or Google Secret Manager so that developers can access them on their personal machines and infrastructure can access them at deployment levels.

Inevitably, these secrets fall out-of-sync or are granted to the wrong party, leading to a complex cadence of re-evaluation and sharing out newly rotated secrets.

 

Enter: 1Password Secrets Automation

Built as a first-class client in the 1Password ecosystem, this tool allows infrastructure to securely access 1Password Vaults' contents in a programmatic fashion, while also allowing RBAC access to the vault for individual teams or users. Secrets Automation is built around a server component you host in your own infrastructure, called the Connect Server, that is deployed on your systems with the sole purpose of syncing encrypted data with the 1Password.com servers.

You can then interact with this server directly using a Kubernetes Operator, Go SDK or other APIs to retrieve (or update, if appropriate permissions are granted) values from the vaults that are enabled for automation.

So How Do I Set It Up?

1Password's preferred method of deployment for the Connect Server is via Helm. As such, they've published open source Charts to help get you up and running quickly. At Wildfire, we use Terraform to manage our infrastructure in GKE + Kubernetes, so the process was reasonably simple by adding a helm_release resource to our deployments:

resource "helm_release" "onepassword-connect" {
name = "onepassword-connect"
repository = "https://1password.github.io/connect-helm-charts/"
chart = "connect"
# 1.7.0+ is required for credentials_base64
version = "1.7.0"
set_sensitive {
name = "connect.credentials_base64"
type = "string"
value = var.connect_credentials
}
set_sensitive {
name = "operator.token.value"
type = "string"
value = var.operator_token
}
# Deploy the Kubernetes Operator alongside the Connect Server
set {
name = "operator.create"
value = true
}
}

Note that this resource deploys both a Connect Server and a Kubernetes Operator. While you do not need to deploy the Kubernetes Operator alongside the Connect Server, this makes setup much simpler and by adding a couple lines to the resource, you can restrict your Connect instance to only be available within the cluster.

Once you have a Connect Server available, you can create CRD resources in Kubernetes directly or via annotations on your deployments. If you would prefer your code contact the Connect Server directly, you can inject a token into the deployment or application and then make requests via the SDK directly.

A Neat Trick

Secrets Automation with the Kubernetes Operator has one extremely valuable feature that truly sold our DevSecOps team on using the platform. When a secret is changed in 1Password, the Connect Server will automatically sync it down to the Kubernetes Operator, which will in turn update the Kubernetes secret.

Additionally, the Operator can then schedule restarts of any deployments using the updates secrets, making rotation seamless and simple.

What If 1Password.com Is Down?

Due to how the Connect Server works, it's simplest to think of it as a local cache of specific pieces of your 1Password account, which are granted via the credentials file. As such, in the event of an outage, either with the outbound side of your infrastructure or with 1Password's own systems, your deployments will continue to work as expected until a secret rotation is necessary.

In the event of a stale cache, any infrastructure pulling from the same Connect instance should remain functional within itself (all systems have the same stale data), but connections to external systems may fail if the secret used to create the connection has been rotated.

Once service is restored after an outage, the next sync event on the Connect Server will pull in the updated data and begin updates and redeployments.

 

Top-Tier Support

Early in our deployments, we had issues with GKE AutoPilot and securing our 1password-credentials.json file in a reasonable manner. The 1Password development team was incredibly responsive and quickly deployed fixes to our reported problems.

This flexibility and willingness to work with users is a great sign for the future of the platform, and has truly helped build confidence in the product within our engineering team.

Lessons Hard-Learned

  • It's easy to be confused during initial deployment

    When first deploying Connect Server, many developers will focus on only that piece of the infrastructure to ensure it's working properly before moving on. However, the 1Password Integrations Dashboard will not show an active connection until you make your first token-authenticated request to the Connect Server. So once you've deployed the server, be sure to insert a CRD, annotation or make a simple cURL request to the connect server to trigger a sync and update the dashboard. This confusion has been reported to 1Password and I'm confident it's something they'll address in the future.

    NOTE: The above discussion of 1Password unavailability is relevant to discuss again here. If you happen to start your Connect Server at precisely the time a 1Password outage is occurring, you will be unable to retrieve secrets until that is resolved. Due to the rarity of 1Password outages combined with the rarity with which users are likely to deploy new production Connect Servers, this seems a worthwhile risk.

  • Annotations to deployments create secrets automatically within the cluster, but are not injected into the deployment when using Terraform

    The Kubernetes Operator supports creating secrets automatically by adding a metadata annotation to the deployment. However, that same process does not inject the secret into the deployment, nor is it possible to deploy via Terraform and reference the secret as an environment variable in the same resource definition that creates the annotation. We are currently in conversation with 1Password about how to overcome this hurdle, as it seems key to successful utilization of the platform.

  • Deploying CRDs directly to Kubernetes via the 1Password official provider requires a second Connect Server

    Since the 1Password provider requires a Connect Server URL to configure, you cannot use the Helm deployment in something like Terraform Cloud. That server is only available within the cluster, so to inject secrets via CRDs of onepassword_item resources, you'd need a second, publicly available Connect Server that TF Cloud can access during a run. In the event that the annotation discussion does not work out, this may be the preferred method of injecting secrets to Kubernetes, but it comes with significant risk from the exposed Connect Server and is not the preferred method.

Summing It Up

Wildfire is committed to a strong security stance across our entire organization. New employees are given 1Password Business accounts on their first day and trained to use the tool both for work secrets, as well as how to leverage the free linked family account to simplify their secrets maintenance and workflows.

We leverage tools like Terraform, not just for our Infrastructure as Code, but also to manage our team layout and access on GitHub, providing a single-source of truth and an auditable record of who changed configurations alongside a commit message, providing context into the "why".

We're early in our deployment of 1Password Secrets Automation at Wildfire. Does this sound like an interesting problem to tackle?