vault-k8s: vault-agent sidecar doesn't extend vault-agent-init's secret, but fetches new ones

Hello there!

Issue

I have a PostgresSQL database and I have set up vault with dynamic roles in order to dynamically generate database credentials for my application. By closely examining the logs I found a (potential?) bug:

The vault-agent-init container asks the vault server for database secrets. This means that a new lease is created. Vault-agent-init writes the credentials to a file.
Vault-agent and the application container start simultaneously.
Vault-agent takes some time (8 seconds - more on this later) to fetch the (new!) secrets.
In the meantime my application reads the secrets generated by the vault-agent-init and connects to the database. However as the vault-agent-init is completed, its lease is not renewed, hence the secrets picked up by the application are invalidated shortly.
Once vault-agent initializes itself it fetches a new pair of secrets, writes these to the file.

Expected behavior: Shouldn’t the vault-agent keep using the vault-agent-init’s lease and thus renewing the secret fetched by vault-agent-init?

Specs

Vault versions

image: "hashicorp/vault-k8s:0.3.0"
image: vault:1.3.2

Auth methods

I’m running my setup on a GCP cluster
I have tried it with the k8s auth method. Then the entire authentication and rendering (of vault-agent) takes 200ms, so the application has no time to pick up the (soon to be expired) secrets of vault-agent-init => it picks up the new (vault-agent) secrets and things work out fine
When using GCP Auth method I first get a timeout (net/http: timeout awaiting response headers), and the authentication only succeeds on the second attempt. (This whole process takes around 8 seconds…

[ERROR] auth.handler: error getting path or data from method: error="unable to sign JWT for projects/xxx/serviceAccounts/xyz@abc.iam.gserviceaccount.com using given Vault credentials: Post https://iam.googleapis.com/v1/projects/xxx/serviceAccounts/xyz@abc.iam.gserviceaccount.com:signJwt?alt=json&prettyPrint=false: Get http://IP/computeMetadata/v1/instance/service-accounts/default/token?scopes=xyz: net/http: timeout awaiting response headers" backoff=1.164255094

The it backs off for a couple of seconds, retries the authentication, succeeds and writes the secrets to the file.
But as said, the application has already picked up the files of the init-container…

Agent config:

    "auto_auth" = {
      "method" = {
        "config" = {
          "role" = "xyz-role"
          "type" = "iam"
          "project" = "xyz"
          "service_account" = "abc@xyz.iam.gserviceaccount.com"
          "jwt_exp" = 10
        }
        "mount_path" = "auth/ourGcpAuthPath"
        "type" = "gcp"
      }

      "sink" = {
        "config" = {
          "path" = "/home/vault/.token"
        }

        "type" = "file"
      }
    }

    "exit_after_auth" = true
    "pid_file" = "/home/vault/.pid"

    "template" = {
      "contents" = "nothing special"
      "destination" = "/vault/secrets/jdbc.yaml"
      "create_dest_dirs" = true
    }

    "vault" = {
      "address" = "https://vault.abc.cde"
    }

The agent config is the same, except the exit_after_auth is set to false.

Deployment annotations and istio config

Our deployment runs with the following annotations:

vault.hashicorp.com/agent-init-first: "true"
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/agent-configmap: our-vault-configmap

We also have an istio sidecar running. To make it work we use a ServiceEntry:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: vault-service-entry
  namespace: someNamespace
spec:
  hosts:
  - vault.vault.svc.cluster.local
  location: MESH_EXTERNAL
  ports:
  - name: http
    number: 8200
    protocol: HTTP
  resolution: DNS

The ServiceEntry is NOT located in the same namespace as our (current) deployment, however I don’t believe this would be a problem.
Istio is on the latest (1.5.1) version.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 10
Comments: 19 (7 by maintainers)

Most upvoted comments

We’ve been working on a solution for this internally. It’s not ready to be rolled out yet but just wanted to mention that this is on our radar and something we plan to address soon.

jasonodonnell on May 12, 2020

@jonthegimp Indeed it does! The catch is that it requires Vault 1.7 which is not yet GA. We expect 1.7 to go GA next week but you can try 1.7 RC1 with the new injector today!

Apologies for the delay on this. This feature was actually a gigantic effort and we hope it helps!

jasonodonnell on Mar 18, 2021

Is there a status update on this? Seems like a pretty big issue that is impacting what I’m working on.

CHR-LeeOlsen on Nov 10, 2020

Hi @jonthegimp, we reviewed a solution internally that didn’t require code changes but we decided to not move forward with it due to added complexity of infrastructure. Basically we were going to alter Vault Helm to deploy a Vault Agent as a daemonset with caching and listeners enabled. The init/sidecar would use the local agent running on the host as a proxy to Vault and it would handle all the caching. This works well, however, it increased risk of secret exposure (it would cache all the secrets for the host) and added complexity/infrastructure to the secret injector.

We’re investigating changes to Vault Agent that would allow us to persist its secret and lease cache to a memory volume so the init/sidecar can coordinate appropriately. This is a decently large change, though, so I wouldn’t expect it any earlier than Vault 1.6.

jasonodonnell on Aug 10, 2020

Hi @jonthegimp, appreciate the ping.

Some updates on this issue:

The scope of this issue is much bigger than we anticipated. Both Vault Agent and Consul Template (what Vault Agent uses for templating) need to be extended to support persisting and loading a cache. We weren’t able to get this major change into 1.6 because it requires a lot of reworking of both projects, so we’re scoped to do this for Vault 1.7.

I know this bug is painful and not everyone has the flexibility to design around it. There is a small workaround which might be acceptable for your use case which I’ve been experimenting with.

Vault Agent has a caching mode which can be enabled via the config. This creates an in-memory cache. Vault Agent can also be used as a proxy for other clients. Combining these two features into a DaemonSet and reconfiguring your sidecars to use the agent proxy running on that host eliminates this issue because the agent running on the host is caching the requests.

I have an example here if you want to experiment with it: https://github.com/jasonodonnell/vault-agent-demo/tree/daemonset.

There’s basically one small change I would need to make to Vault K8s that sets the downward API HOST_IP environment variable on the agent containers. By doing this it would be very easy to use the vault.hashicorp.com/service annotation to redirect the agents to the proxy.

We discussed internally if we should just make the DaemonSet infrastructure part of Vault Helm but ultimately decided we shouldn’t because the added infrastructure/complexity seemed risky compared to sharing a persisted cache.

If you’re interested in this architecture I could optimize Vault K8s for it, but deployment/configuring the daemonset would be up to you.

jasonodonnell on Nov 24, 2020

Closing since persistent caching has been released.

jasonodonnell on May 21, 2021

@jasonodonnell , any update on the solution, or do you have another issue to follow to watch for this update?

jonpjenkins on Aug 10, 2020